首先我想提一下我是Tensorflow的新手,我正在使用CTC(连接主义时间分类)和LSTM(长期短期内存)开展OCR项目。我已完成培训,当我尝试restore session时,我发现了一个通常在互联网上发布的错误,但提供了不同的分析。
错误是: -
2018-01-10 13:42:43.179534: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-10 13:42:43.179939: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: Quadro M4000
major: 5 minor: 2 memoryClockRate (GHz) 0.7725
pciBusID 0000:00:05.0
Total memory: 7.93GiB
Free memory: 7.56GiB
2018-01-10 13:42:43.179974: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2018-01-10 13:42:43.179986: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y
2018-01-10 13:42:43.180002: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Quadro M4000, pci bus id: 0000:00:05.0)
2018-01-10 13:42:43.316563: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
[[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.319682: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
[[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.332996: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
[[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.333927: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
[[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.334583: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
[[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.379830: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
[[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.380081: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
[[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.380189: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
[[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.380188: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
[[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.380343: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
[[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.380554: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
[[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.415117: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
[[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
Traceback (most recent call last):
File "detect.py", line 62, in <module>
print(detect(test_inputs, test_targets, test_seq_len))
File "detect.py", line 23, in detect
saver.restore(sess,'models/ocr.model-100000')
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1548, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
[[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
Caused by op u'save/Assign_1', defined at:
File "detect.py", line 62, in <module>
print(detect(test_inputs, test_targets, test_seq_len))
File "detect.py", line 20, in detect
saver = tf.train.Saver()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1139, in __init__
self.build()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1170, in build
restore_sequentially=self._restore_sequentially)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 691, in build
restore_sequentially, reshape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 419, in _AddRestoreOps
assign_ops.append(saveable.restore(tensors, shapes))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 155, in restore
self.op.get_shape().is_fully_defined())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/state_ops.py", line 271, in assign
validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_state_ops.py", line 45, in assign
use_locking=use_locking, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
[[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
我分析了与函数saver.restore(sess,'models/ocr.model-100000')
这主要与很多事情有关,我到目前为止所做的是:
删除了从以前的培训中保存的所有检查点,然后重新开始,但仍然不够
我使用了Tensorflow提供的函数print_tensors_in_checkpoint_file
,检查点对我来说很好。
这是输出:
Variable []
Variable_1 [5, 5, 1, 48]
Variable_1/Momentum [5, 5, 1, 48]
Variable_2 [48]
Variable_2/Momentum [48]
Variable_3 [5, 5, 48, 64]
Variable_3/Momentum [5, 5, 48, 64]
Variable_4 [64]
Variable_4/Momentum [64]
Variable_5 [5, 5, 64, 128]
Variable_5/Momentum [5, 5, 64, 128]
Variable_6 [128]
Variable_6/Momentum [128]
Variable_7 [65536, 256]
Variable_7/Momentum [65536, 256]
Variable_8 [256]
Variable_8/Momentum [256]
W [128, 38]
W/Momentum [128, 38]
b [38]
b/Momentum [38]
rnn/multi_rnn_cell/cell_0/lstm_cell/bias [512]
rnn/multi_rnn_cell/cell_0/lstm_cell/bias/Momentum [512]
rnn/multi_rnn_cell/cell_0/lstm_cell/kernel [129, 512]
rnn/multi_rnn_cell/cell_0/lstm_cell/kernel/Momentum [129, 512]
rnn/multi_rnn_cell/cell_1/lstm_cell/bias [512]
rnn/multi_rnn_cell/cell_1/lstm_cell/bias/Momentum [512]
rnn/multi_rnn_cell/cell_1/lstm_cell/kernel [256, 512]
rnn/multi_rnn_cell/cell_1/lstm_cell/kernel/Momentum [256, 512]
[<tf.Variable 'Variable:0' shape=(5, 5, 1, 48) dtype=float32_ref>, <tf.Variable 'Variable_1:0' shape=(48,) dtype=float32_ref>, <tf.Variable 'Variable_2:0' shape=(5, 5, 48, 64) dtype=float32_ref>, <tf.Variable 'Variable_3:0' shape=(64,) dtype=float32_ref>, <tf.Variable 'Variable_4:0' shape=(5, 5, 64, 128) dtype=float32_ref>, <tf.Variable 'Variable_5:0' shape=(128,) dtype=float32_ref>, <tf.Variable 'Variable_6:0' shape=(65536, 256) dtype=float32_ref>, <tf.Variable 'Variable_7:0' shape=(256,) dtype=float32_ref>, <tf.Variable 'rnn/multi_rnn_cell/cell_0/lstm_cell/kernel:0' shape=(129, 512) dtype=float32_ref>, <tf.Variable 'rnn/multi_rnn_cell/cell_0/lstm_cell/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'rnn/multi_rnn_cell/cell_1/lstm_cell/kernel:0' shape=(256, 512) dtype=float32_ref>, <tf.Variable 'rnn/multi_rnn_cell/cell_1/lstm_cell/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'W:0' shape=(128, 38) dtype=float32_ref>, <tf.Variable 'b:0' shape=(38,) dtype=float32_ref>]
我的好奇是关于保护程序如何获得大小以及如何调试代码。
答案 0 :(得分:0)
在某些时候,您似乎已经更改了图表定义中的变量顺序:
[5, 5, 1, 48]
的形状为Variable_1
,[48]
是保存的检查点中Variable_2
的形状。
命名表示您没有为变量提供明确的名称,因此他们得到了名称Variable
,Variable_1
,Variable_2
,......后缀根据tensorflow看到它们的顺序确定,所以如果你在代码中交换两个变量,它们会得到不同的名称。之后,您无法再导入先前保存的检查点,因为tensorflow在同一名称下看到不同的张量。
最佳做法是通过name
属性明确指定每个变量的名称:
W_conv1 = `tf.Variable(..., name='W_conv1')
b_conv1 = `tf.Variable(..., name='b_conv1')
...
这样,代码对模型中的小扰动更加鲁棒。