Question

我正在尝试向我的CNN添加批量规范化并阅读了很多关于如何执行此操作的帖子但仍然我的实现在将训练设置为False时产生了一系列nans。

即使在测试时间将训练设置为True，如果我在训练图像上进行测试，结果也不是楠，但是它们比训练时间更差。

我使用衰减0.9 并接受 15 000次迭代

的培训

以下是我的图表构建，根据tf.layers.batch_normalization documentation中的建议添加update ops作为依赖项，然后使用sess

extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
  with tf.control_dependencies(extra_update_ops):
    phase_train = tf.placeholder(tf.bool, name='phase_train')

    ###### Other placeholders and variables declarations ######

    # Build a Graph that computes the logits predictions from the inference model.

    loss, eval_prediction = inference(train_data_node, train_labels_node, batch_size, phase_train, dropout_out_keep_prob)

    # Build a Graph that trains the model with one batch of examples and updates the model parameters.

    ###### Should I rather put the dependency here ? ######
    train_op = train(loss, global_step)

    saver = tf.train.Saver(tf.global_variables())

    with tf.Session() as sess:
          init = tf.global_variables_initializer()
          sess.run(init)

          # Start the queue runners.
          coord = tf.train.Coordinator()
          threads = tf.train.start_queue_runners(sess=sess, coord=coord)

          for step in range(startstep, startstep + max_steps):
            feed_dict = fill_feed_dict(train_labels_node, train_data_node, dropout_out_keep_prob, phase_train, batch_size, phase_train_val=True,drop_out_keep_prob_val=1.)
            _, loss_value = sess.run([train_op, loss], feed_dict=feed_dict)

这是我的 batch_norm函数调用：

def batch_norm_layer(inputT, is_training, scope):
    return tf.layers.batch_normalization(inputT, training=is_training, center=False, reuse=None, momentum=0.9)

以下是我如何恢复测试的模型：

phase_train = tf.placeholder(tf.bool, name='phase_train')

###### Other placeholder definitions ######

loss, logits = inference(test_data_node, test_labels_node, batch_size, phase_train, drop_out_keep_prob)
pred = tf.argmax(logits, dimension=3)

saver = tf.train.Saver()

with tf.Session() as sess:
  saver.restore(sess, test_ckpt)

  threads = tf.train.start_queue_runners(sess=sess)

  feed_dict = fill_feed_dict(test_labels_node, test_data_node, drop_out_keep_prob, phase_train, batch_size=1, phase_train_val=False, drop_out_keep_prob_val=1.)

  pred_loss, dense_prediction, predicted_image = sess.run([loss, logits, pred], feed_dict=feed_dict)

这里的dense_prediction给出了一个Nans数组，因此predict_image全部为0 我的施工中有错误吗？我该如何解决/诊断呢？

欢迎提供任何帮助，我已经阅读了很多使用的教程＆＃34;手工制作＆＃34;批量规范，但我找不到关于如何使用官方批量规范的精心指导教程，猜测是因为它太明显了，但它不适合我！

Answer 1

问题似乎来自于我使用批量规范化以及 tf.nn.dropout dropout实现。

切换到 tf.layers.dropout 解决了这个问题。

在测试阶段使用具有丢失的批量标准化给出了Nan

1 个答案: