恢复模型时TensorFlow中的NotFoundError

时间:2018-06-29 22:00:19

标签: python windows tensorflow deep-learning tensorboard

我建立并保存了一个TensorFlow模型,然后尝试恢复该模型并使用它。 我使用的是旧方法,原因是该代码是用旧版本的tensorflow编写的(现在我正在使用python 3.5和tensorflow 1.8.0)。

这是我保存模型的代码:

sess = tf.InteractiveSession()
..>
#build the computational graph and all the layers. for example, the 1st layer:
W_conv1 = weight_variable([first_conv_kernel_size, first_conv_kernel_size, 1, first_conv_output_channels]) # 5x5 patch, 1 input channel, 32 output channels (features)
b_conv1 = bias_variable([first_conv_output_channels])
x_image = tf.reshape(x, [-1,patch_size,patch_size,1]) # reshape x to a 4d tensor. 2,3 are the image dimensions, 4 is ine color channel
..<
sess.run(tf.initialize_all_variables())
..>
#some more code    
..<
# saving the model:
saver = tf.train.Saver()
save_path = saver.save(sess, main_code_folder + 'code_files/Tensor_Flow/version1/built_networks/10 - testing_the_train_function/model.ckpt')

这就是我恢复模型的方式:

# initial parameters + build layers for tensorboard visualisation. for example, layer 1:
with tf.name_scope('conv_layer1'):
    # build the first layer
    with tf.name_scope('weights'):
        W_conv1 = weight_variable([first_conv_kernel_size, first_conv_kernel_size, 1, first_conv_output_channels]) # 5x5 patch, 1 input channel, 32 output channels (features)
        variable_summaries(W_conv1)
    with tf.name_scope('biases'):
        b_conv1 = bias_variable([first_conv_output_channels])
        variable_summaries(b_conv1)

    x_image = tf.reshape(x, [-1, patch_size, patch_size, 1]) # reshape x to a 4d tensor. 2,3 are the image dimensions, 4 is ine color channel

    with tf.name_scope('Wx_plus_b'):
        Wx_plus_b=conv2d(x_image, W_conv1) + b_conv1
        variable_summaries(Wx_plus_b)

    # apply the layers
    h_conv1 = tf.nn.relu(Wx_plus_b)
...
saver = tf.train.Saver()
savepath = make_folder_name_Win_format(main_code_folder + 'code_files/Tensor_Flow/version1/built_networks/10 - testing_the_train_function/')
saver.restore(sess, save_path = savepath + '{}'.format(model_name))

运行此代码时,遇到以下错误:

tensorflow.python.framework.errors_impl.NotFoundError: Key conv_layer1/biases/Variable not found in checkpoint

我看到一些已解决的类似问题,并尝试了解决方案。没有人工作。两种代码中的目录名称都相同(据我所知,能否给我一个建议如何确认的建议?),并且模型也已正确保存(相同的注释)。

非常感谢您的帮助! 谢谢!

下面的完整错误日志:

2018-06-30 00:53:02.524332: W T:\src\github\tensorflow\tensorflow\core\framework\op_kernel.cc:1318] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key conv_layer1/biases/Variable not found in checkpoint
Traceback (most recent call last):
  File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1322, in _do_call
    return fn(*args)
  File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key conv_layer1/biases/Variable not found in checkpoint
     [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
     [[Node: save/RestoreV2/_7 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_12_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/Roi/Desktop/Code_Win_Ver/code_files/Tensor_Flow/version1/find_labels_for_db.py", line 252, in <module>
    saver.restore(sess, save_path = savepath + '{}'.format(model_name))
  File "C:\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 1802, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 900, in run
    run_metadata_ptr)
  File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1316, in _do_run
    run_metadata)
  File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key conv_layer1/biases/Variable not found in checkpoint
     [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
     [[Node: save/RestoreV2/_7 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_12_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

Caused by op 'save/RestoreV2', defined at:
  File "C:/Users/Roi/Desktop/Code_Win_Ver/code_files/Tensor_Flow/version1/find_labels_for_db.py", line 247, in <module>
    saver = tf.train.Saver()
  File "C:\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 1338, in __init__
    self.build()
  File "C:\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 1347, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "C:\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 1384, in _build
    build_save=build_save, build_restore=build_restore)
  File "C:\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 835, in _build_internal
    restore_sequentially, reshape)
  File "C:\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 472, in _AddRestoreOps
    restore_sequentially)
  File "C:\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 886, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "C:\Python35\lib\site-packages\tensorflow\python\ops\gen_io_ops.py", line 1546, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "C:\Python35\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Python35\lib\site-packages\tensorflow\python\framework\ops.py", line 3392, in create_op
    op_def=op_def)
  File "C:\Python35\lib\site-packages\tensorflow\python\framework\ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

NotFoundError (see above for traceback): Key conv_layer1/biases/Variable not found in checkpoint
     [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
     [[Node: save/RestoreV2/_7 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_12_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]


Process finished with exit code 1

1 个答案:

答案 0 :(得分:0)

因此发生错误,因为检查点中不存在变量。要解决此问题,您的保护程序之前会创建相应的变量。

saver = tf.train.Saver()
conv_layer1 = ...

saver.restore(path=...)

现在,如果您在培训之后致电保存 或可以致电保存的任何内容。所有新添加的变量,例如conv_layer1 / biases / Variable将已经存在的变量添加到该检查点。

之后,您应该重新排列代码,以便在这些变量之后调用保护程序,这会导致问题,例如:

conv_layer1 = ...
saver = tf.train.Saver()

saver.restore(path=...)