我在运行Ubuntu 14.04的64位GPU上配置了TensorBox。 TensorFlow已在机器上设置并完全正常运行。
当我跑步时
python train.py --hypes hypes/lstm_rezoom.json --gpu 0 --logdir output
在TensorBox目录中的- 应该开始在指定的数据目录上重新训练网络 - 我收到以下错误:
Traceback (most recent call last):
File "~/.local/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 85, in __call__
ret = func(*args)
File "train.py", line 390, in log_image
rnn_len=H['rnn_len'])[0]
File "~/tensorbox/utils/train_utils.py", line 127, in add_rectangles
from stitch_wrapper import stitch_rects
ImportError: ~/tensorbox/utils/stitch_wrapper.so: undefined symbol: PyUnicodeUCS2_DecodeUTF8
W tensorflow/core/framework/op_kernel.cc:975] Internal: Failed to run py callback pyfunc_1: see error log.
W tensorflow/core/framework/op_kernel.cc:975] Internal: Failed to run py callback pyfunc_1: see error log.
[[Node: PyFunc_1 = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_STRING], Tout=[DT_FLOAT], token="pyfunc_1", _device="/job:localhost/replica:0/task:0/cpu:0"](fifo_queue_1_DequeueMany, strided_slice_12, strided_slice_13, Variable/read, PyFunc_1/input_4)]]
W tensorflow/core/framework/op_kernel.cc:975] Internal: Failed to run py callback pyfunc_1: see error log.
[[Node: PyFunc_1 = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_STRING], Tout=[DT_FLOAT], token="pyfunc_1", _device="/job:localhost/replica:0/task:0/cpu:0"](fifo_queue_1_DequeueMany, strided_slice_12, strided_slice_13, Variable/read, PyFunc_1/input_4)]]
Traceback (most recent call last):
File "~/.local/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 85, in __call__
ret = func(*args)
File "train.py", line 390, in log_image
rnn_len=H['rnn_len'])[0]
File "~/tensorbox/utils/train_utils.py", line 127, in add_rectangles
from stitch_wrapper import stitch_rects
ImportError: ~/tensorbox/utils/stitch_wrapper.so: undefined symbol: PyUnicodeUCS2_DecodeUTF8
W tensorflow/core/framework/op_kernel.cc:975] Internal: Failed to run py callback pyfunc_0: see error log.
W tensorflow/core/kernels/queue_base.cc:294] _0_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
W tensorflow/core/kernels/queue_base.cc:294] _1_fifo_queue_1: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
File "train.py", line 549, in <module>
main()
File "train.py", line 546, in main
train(H, test_images=[])
File "train.py", line 504, in train
], feed_dict=lr_feed)
File "~/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "~/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "~/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "~/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Failed to run py callback pyfunc_1: see error log.
[[Node: PyFunc_1 = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_STRING], Tout=[DT_FLOAT], token="pyfunc_1", _device="/job:localhost/replica:0/task:0/cpu:0"](fifo_queue_1_DequeueMany, strided_slice_12, strided_slice_13, Variable/read, PyFunc_1/input_4)]]
Caused by op u'PyFunc_1', defined at:
File "train.py", line 549, in <module>
main()
File "train.py", line 546, in main
train(H, test_images=[])
File "train.py", line 448, in train
smooth_op, global_step, learning_rate) = build(H, q)
File "train.py", line 402, in build
[tf.float32])
File "~/.local/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 192, in py_func
input=inp, token=token, Tout=Tout, name=name)
File "~/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_script_ops.py", line 40, in _py_func
name=name)
File "~/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
op_def=op_def)
File "~/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "~/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
self._traceback = _extract_stack()
InternalError (see above for traceback): Failed to run py callback pyfunc_1: see error log.
[[Node: PyFunc_1 = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_STRING], Tout=[DT_FLOAT], token="pyfunc_1", _device="/job:localhost/replica:0/task:0/cpu:0"](fifo_queue_1_DequeueMany, strided_slice_12, strided_slice_13, Variable/read, PyFunc_1/input_4)]]
我似乎无法弄清楚问题所在。我已经检查过它访问的文件(例如.so)实际上是在正确的目录路径中,并且我尝试过对其他人建议的train.py文件本身的一些修改帮助页面(例如更改
state_is_tuple
在第39和41行到False) - 但它们似乎不是问题的原因。