Question

我在尝试恢复经过训练的评估模型时遇到错误，但仅在评估测试集时才出错。错误是：

InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [2325,11] rhs shape= [4891,11]

注意，lhs shape = [2325,11]和rhs shape = [4891,11]对应于测试集中的2325个图像和训练集中的4891个图像; 11是11类的单热编码 - 所以这些可能对应于标签。当我在训练集上运行评估时，维度匹配并且没有错误结果。帮助将不胜感激！

下面的完整堆栈跟踪：

Traceback (most recent call last):
  File "eval.py", line 75, in <module>
    main()
  File "eval.py", line 70, in main
    acc_annotation, acc_retrieval = evaluate(partition="test")
  File "eval.py", line 34, in evaluate
    restorer.restore(sess, tf.train.latest_checkpoint(SAVED_MODEL_DIR))
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1388, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 964, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [2325,11] rhs shape= [4891,11]
     [[Node: save/Assign_5 = Assign[T=DT_FLOAT, _class=["loc:@input/Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](input/Variable_1, save/RestoreV2_5)]]

Caused by op u'save/Assign_5', defined at:
  File "eval.py", line 75, in <module>
    main()
  File "eval.py", line 70, in main
    acc_annotation, acc_retrieval = evaluate(partition="test")
  File "eval.py", line 25, in evaluate
    restorer = tf.train.Saver()  # For saving the model
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1000, in __init__
    self.build()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1030, in build
    restore_sequentially=self._restore_sequentially)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 624, in build
    restore_sequentially, reshape)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 373, in _AddRestoreOps
    assign_ops.append(saveable.restore(tensors, shapes))
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 130, in restore
    self.op.get_shape().is_fully_defined())
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 47, in assign
    use_locking=use_locking, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [2325,11] rhs shape= [4891,11]
     [[Node: save/Assign_5 = Assign[T=DT_FLOAT, _class=["loc:@input/Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](input/Variable_1, save/RestoreV2_5)]]

更新

我只是查看了检查点文件中的张量形状，看起来保护程序甚至保存了模型的输入。我需要重新配置我的培训代码或以其他方式弄清楚如何从检查点中排除模型输入（标签和图像）：

('tensor_name: ', 'conv2-layer/bias/Adam_1')
(512,)
('tensor_name: ', 'input/Variable_1')
(4891, 11)
('tensor_name: ', 'conv2-layer/weights_1/Adam')
(5, 1, 64, 512)
('tensor_name: ', 'conv1-layer/weights_1')
(5, 23, 1, 64)
('tensor_name: ', 'conv2-layer/weights_1')
(5, 1, 64, 512)
('tensor_name: ', 'conv2-layer/weights_1/Adam_1')
(5, 1, 64, 512)
('tensor_name: ', 'input/Variable')
(4891, 100, 23, 1)
('tensor_name: ', 'conv1-layer/weights_1/Adam_1')
(5, 23, 1, 64)
('tensor_name: ', 'conv1-layer/bias/Adam')
(64,)
('tensor_name: ', 'beta2_power')
()
('tensor_name: ', 'conv2-layer/bias/Adam')
(512,)
('tensor_name: ', 'conv1-layer/bias/Adam_1')
(64,)
('tensor_name: ', 'conv2-layer/bias')
(512,)
('tensor_name: ', 'conv1-layer/bias')
(64,)
('tensor_name: ', 'beta1_power')
()
('tensor_name: ', 'conv1-layer/weights_1/Adam')
(5, 23, 1, 64)
('tensor_name: ', 'Variable')
()

Answer 1

查看定义模型的代码会很有用;但从我可以理解的情况来看，您似乎已将输入定义为tf.Variable。变量是允许优化器更改的值，以便最小化损失函数。变量是模型的学习权重，这就是Tensorflow保存它们以便以后可以恢复的原因。

您应该使用tf.Placeholder将输入数据提供给图表。

恢复经过训练的eval模型时出现Tensorflow错误，无效形状问题？

1 个答案: