我将两个训练有素的模型还原为一个模型,以便使用Google colab中的tensorflow进行微调。第一个模型已成功还原,但是第二个模型还原后,它将报告此信息“未找到检查点文件”,但不作为警告或错误。我不知道这是否意味着恢复失败。
要调试它,我只是将第二个模型的检查点目录更改为第一个模型的目录,但是它报告“未找到检查点文件”,然后报告缺少张量名称。这样是否就意味着尽管它报告“未找到检查点文件”,但只要所需的所有文件都在目录中,模型仍将还原?
这是我的相关代码:
with tf.train.MonitoredTrainingSession(checkpoint_dir=train_dir,
hooks=[tf.train.StopAtStepHook(last_step=max_steps),
tf.train.NanTensorHook(loss),
tf.train.CheckpointSaverHook(checkpoint_dir=train_dir, saver=saver3, save_steps=1000),
_LoggerHook(),
_LoggerHook2(),
#_LoggerHook3(),
_LoggerHook4()],
#scaffold=scaffold,
config=tf.ConfigProto(
log_device_placement=log_device_placement)) as mon_sess:
#mon_sess.run(tf.global_variables_initializer())
saver1.restore(mon_sess, "drive/My Drive/ckpt_basic_new/model.ckpt-100000")
#saver2.restore(mon_sess, "drive/My Drive/ckpt_basic_new/model.ckpt-100000")
saver2.restore(mon_sess, "drive/My Drive/ckpt_cal/model.ckpt-57376")
while not mon_sess.should_stop():
mon_sess.run(train_op, feed_dict={training:True})
信息:
I0719 15:11:00.575512 140037786351488 saver.py:1280] Restoring parameters from drive/My Drive/ckpt_basic_new/model.ckpt-100000
2019-07-19 15:11:06.927919: step 0, loss = 418.70550537 (5749.2 examples/sec; 0.022 sec/batch)
I0719 15:11:11.685403 140037786351488 saver.py:1280] Restoring parameters from drive/My Drive/ckpt_cal/model.ckpt-57376
No checkpoint file found
W0719 15:11:12.715127 140037786351488 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
感谢您的回复!