还原中的InvalidArgumentError:Assign要求两个张量的形状匹配

时间:2018-01-10 11:34:52

标签: python tensorflow lstm tensorflow-serving

首先我想提一下我是Tensorflow的新手,我正在使用CTC(连接主义时间分类)和LSTM(长期短期内存)开展OCR项目。我已完成培训,当我尝试restore session时,我发现了一个通常在互联网上发布的错误,但提供了不同的分析。

错误是: -

 2018-01-10 13:42:43.179534: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-10 13:42:43.179939: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: Quadro M4000
major: 5 minor: 2 memoryClockRate (GHz) 0.7725
pciBusID 0000:00:05.0
Total memory: 7.93GiB
Free memory: 7.56GiB
2018-01-10 13:42:43.179974: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2018-01-10 13:42:43.179986: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2018-01-10 13:42:43.180002: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Quadro M4000, pci bus id: 0000:00:05.0)
2018-01-10 13:42:43.316563: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.319682: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.332996: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.333927: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.334583: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.379830: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.380081: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.380189: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.380188: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.380343: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.380554: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.415117: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
Traceback (most recent call last):
  File "detect.py", line 62, in <module>
    print(detect(test_inputs, test_targets, test_seq_len))
  File "detect.py", line 23, in detect
    saver.restore(sess,'models/ocr.model-100000')
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1548, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]

Caused by op u'save/Assign_1', defined at:
  File "detect.py", line 62, in <module>
    print(detect(test_inputs, test_targets, test_seq_len))
  File "detect.py", line 20, in detect
    saver = tf.train.Saver()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1139, in __init__
    self.build()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1170, in build
    restore_sequentially=self._restore_sequentially)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 691, in build
    restore_sequentially, reshape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 419, in _AddRestoreOps
    assign_ops.append(saveable.restore(tensors, shapes))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 155, in restore
    self.op.get_shape().is_fully_defined())
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/state_ops.py", line 271, in assign
    validate_shape=validate_shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_state_ops.py", line 45, in assign
    use_locking=use_locking, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]

我分析了与函数saver.restore(sess,'models/ocr.model-100000')

相关的错误

这主要与很多事情有关,我到目前为止所做的是:

  • 删除了从以前的培训中保存的所有检查点,然后重新开始,但仍然不够

  • 我使用了Tensorflow提供的函数print_tensors_in_checkpoint_file,检查点对我来说很好。

这是输出:

Variable    []
Variable_1  [5, 5, 1, 48]
Variable_1/Momentum [5, 5, 1, 48]
Variable_2  [48]
Variable_2/Momentum [48]
Variable_3  [5, 5, 48, 64]
Variable_3/Momentum [5, 5, 48, 64]
Variable_4  [64]
Variable_4/Momentum [64]
Variable_5  [5, 5, 64, 128]
Variable_5/Momentum [5, 5, 64, 128]
Variable_6  [128]
Variable_6/Momentum [128]
Variable_7  [65536, 256]
Variable_7/Momentum [65536, 256]
Variable_8  [256]
Variable_8/Momentum [256]
W   [128, 38]
W/Momentum  [128, 38]
b   [38]
b/Momentum  [38]
rnn/multi_rnn_cell/cell_0/lstm_cell/bias    [512]
rnn/multi_rnn_cell/cell_0/lstm_cell/bias/Momentum   [512]
rnn/multi_rnn_cell/cell_0/lstm_cell/kernel  [129, 512]
rnn/multi_rnn_cell/cell_0/lstm_cell/kernel/Momentum [129, 512]
rnn/multi_rnn_cell/cell_1/lstm_cell/bias    [512]
rnn/multi_rnn_cell/cell_1/lstm_cell/bias/Momentum   [512]
rnn/multi_rnn_cell/cell_1/lstm_cell/kernel  [256, 512]
rnn/multi_rnn_cell/cell_1/lstm_cell/kernel/Momentum [256, 512]
[<tf.Variable 'Variable:0' shape=(5, 5, 1, 48) dtype=float32_ref>, <tf.Variable 'Variable_1:0' shape=(48,) dtype=float32_ref>, <tf.Variable 'Variable_2:0' shape=(5, 5, 48, 64) dtype=float32_ref>, <tf.Variable 'Variable_3:0' shape=(64,) dtype=float32_ref>, <tf.Variable 'Variable_4:0' shape=(5, 5, 64, 128) dtype=float32_ref>, <tf.Variable 'Variable_5:0' shape=(128,) dtype=float32_ref>, <tf.Variable 'Variable_6:0' shape=(65536, 256) dtype=float32_ref>, <tf.Variable 'Variable_7:0' shape=(256,) dtype=float32_ref>, <tf.Variable 'rnn/multi_rnn_cell/cell_0/lstm_cell/kernel:0' shape=(129, 512) dtype=float32_ref>, <tf.Variable 'rnn/multi_rnn_cell/cell_0/lstm_cell/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'rnn/multi_rnn_cell/cell_1/lstm_cell/kernel:0' shape=(256, 512) dtype=float32_ref>, <tf.Variable 'rnn/multi_rnn_cell/cell_1/lstm_cell/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'W:0' shape=(128, 38) dtype=float32_ref>, <tf.Variable 'b:0' shape=(38,) dtype=float32_ref>]

我的好奇是关于保护程序如何获得大小以及如何调试代码。

1 个答案:

答案 0 :(得分:0)

在某些时候,您似乎已经更改了图表定义中的变量顺序: [5, 5, 1, 48]的形状为Variable_1[48]是保存的检查点中Variable_2的形状。

命名表示您没有为变量提供明确的名称,因此他们得到了名称VariableVariable_1Variable_2,......后缀根据tensorflow看到它们的顺序确定,所以如果你在代码中交换两个变量,它们会得到不同的名称。之后,您无法再导入先前保存的检查点,因为tensorflow在同一名称下看到不同的张量。

最佳做法是通过name属性明确指定每个变量的名称:

W_conv1 = `tf.Variable(..., name='W_conv1')
b_conv1 = `tf.Variable(..., name='b_conv1')
...

这样,代码对模型中的小扰动更加鲁棒。