Question

我有以下设置：我在GPU服务器上训练模型，使用tf.train.Supervisor（）中的tf.train.Saver（）功能保存检查点。训练结束后，我想将此模型转移到我的笔记本电脑并加载它以用于推理目的。尝试使用self.saver.restore(sess,self.checkpoint_path)恢复模型时，（事先重新创建了正确的图形），我得到以下错误：

E tensorflow/core/client/tensor_c_api.cc:485] Cannot assign a device to node 'worker_0/save/Const': Could not satisfy explicit device specification '/job:worker/task:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
Colocation Debug Info:
Colocation group had the following types and devices: 
Identity: CPU 
Const: CPU 
 [[Node: worker_0/save/Const = Const[dtype=DT_STRING, value=Tensor<type: string shape: [] values: model>, _device="/job:worker/task:0"]()]]

分析

返回的cpkt对象的属性时

cpkt = tf.train.get_checkpoint_state(self.checkpoint_dir)

我看到cpkt.model_checkpoint_path指向服务器上创建检查点的原始路径，而不是self.checkpoint_path，我尝试从中恢复模型。

这两件事是否相关？或者是否有上述错误消息的另一个原因。

任何帮助将不胜感激，

垫

Answer 1

听起来您的设备分配已保存，并且恢复环境中无法使用相同的设备。

clear_devices和freeze_graph中有一个标记import_meta_graph，可用于清除该信息。

或者，您可以使用图表信息修改pbtxt并手动删除以device:开头的所有行

在TensorFlow中移动检查点

1 个答案: