当我在Tensorflow中恢复我的模型时会发生奇怪的事情

时间:2017-04-17 02:19:46

标签: python tensorflow deep-learning

我只是想加载我之前保存的模型并进一步训练,我的代码工作正常直到恢复步骤,当我使用'sess.run'时,事情变得奇怪。程序立即结束而不执行'sess.run'。

但是,当我删除我的AdamOptimizer操作时,'sess,run'重新开始工作

为什么?

以下是代码:

ckpt_state = tf.train.get_checkpoint_state(last_checkpoint_path)
if not ckpt_state or not ckpt_state.model_checkpoint_path:
    print('No check point files are found!')
    return


ckpt_files = ckpt_state.all_model_checkpoint_paths
num_ckpt = len(ckpt_files)

if num_ckpt < 1:
    print('No check point files are found!')
    return

low_res_holder = tf.placeholder(tf.float32, shape=[BATCH_SIZE, INPUT_SIZE, INPUT_SIZE, NUM_CHENNELS])
high_res_holder = tf.placeholder(tf.float32, shape=[BATCH_SIZE, LABEL_SIZE, LABEL_SIZE, NUM_CHENNELS])

keep_prob = tf.placeholder(tf.float32)
is_training = tf.placeholder("bool", shape=[])

global_step = tf.Variable(0, trainable=False, name='global_step')

inferences = models.creat_Dense_Modelpatches(low_res_holder, 13, is_training, keep_prob)
training_loss = models.loss(inferences, high_res_holder, name='training_loss')

low_res_batches, high_res_batches = batch_queue_for_testing(TESTING_DATA_PATH)

learning_rate = tf.train.inverse_time_decay(0.001, global_step, 10000, 2)

train_step = tf.train.AdamOptimizer(learning_rate).minimize(training_loss, global_step=global_step)

config = tf.ConfigProto()
config.gpu_options.allow_growth = True

sess = tf.Session(config=config)
sess.run(tf.global_variables_initializer())
tf.train.start_queue_runners(sess=sess)

saver = tf.train.Saver(tf.global_variables())

ckpt_file = ckpt_files[-1]

saver.restore(sess, ckpt_file)

low_res_images, high_res_images = sess.run([low_res_batches, high_res_batches])

print("thie code has ran this line...")

当我用

运行此代码时
train_step = tf.train.AdamOptimizer(learning_rate).minimize(training_loss, global_step=global_step)

输出为

I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:05:00.0)
mt@sj408:~/JP/DR/DR$

但是当删除train_step op时,输出将是这样的:

I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:05:00.0)
thie code has ran this line...
mt@sj408:~/JP/DR/DR$

1 个答案:

答案 0 :(得分:0)

您可能需要加入用于异步执行的所有线程。以下是(https://www.tensorflow.org/programmers_guide/reading_data

的示例摘录
with tf.Session() as sess:
  # Start populating the filename queue.
  coord = tf.train.Coordinator()
  threads = tf.train.start_queue_runners(coord=coord)

  for i in range(1200):
    # Retrieve a single instance:
    example, label = sess.run([features, col5])

  coord.request_stop()    # <==== You are missing this
  coord.join(threads)     # <==== And this

如果这不能解决您的问题,那么提供一个我可以在本地调试的最小工作示例会很有帮助。