Question

我的训练脚本，用于训练TensorFlow模型，在线教程稍作修改：

def train(data_set_dir, train_set_dir):
    data = data_input.read_data_sets(data_set_dir, train_set_dir)

    with tf.Graph().as_default():
        global_step = tf.Variable(0, trainable=False)
        # defines placeholders (type=tf.float32)
        images_placeholder, labels_placeholder = placeholder_inputs(batch_size, image_size, channels)

        logits = model.inference(images_placeholder, num_classes)
        loss = loss(logits, labels_placeholder, num_classes)
        train_op = training(loss, global_step, batch_size)

        saver = tf.train.Saver(tf.all_variables()) 
        summary_op = tf.merge_all_summaries()  
        init = tf.initialize_all_variables()
        sess = tf.Session()
        sess.run(init)
        summary_writer = tf.train.SummaryWriter(FLAGS.train_dir, sess.graph)

        for step in range(max_steps):
            start_time = time.time()
            feed_dict = fill_feed_dict(data, images_placeholder, labels_placeholder, batch_size)
            _, loss_value = sess.run([train_op, loss], feed_dict=feed_dict)
            # ... continue to print loss_value, run summaries and save checkpoints

上面调用的placeholder_inputs函数是：

def placeholder_inputs(batch_size, img_size, channels):
    images_pl = tf.placeholder(tf.float32,
                                        shape=(batch_size, img_size, img_size, channels), name='images')
    labels_pl = tf.placeholder(tf.float32,
                                        shape=(batch_size, img_size, img_size), name='labels')
    return images_pl, labels_pl

为了澄清，我所处理的数据是针对分段问题中的每像素分类。如上所示，这是一个二元分类问题。

feed_dict函数是：

def fill_feed_dict(data_set, images_pl, labels_pl, batch_size):
    images_feed, labels_feed = data_set.next_batch(batch_size)
    feed_dict = {images_pl: images_feed, labels_pl: labels_feed}
    return feed_dict

我坚持的地方：

tensorflow.python.framework.errors.InvalidArgumentError: You must feed a value for placeholder tensor 'labels' with dtype float and shape [1,750,750]
 [[Node: labels = Placeholder[dtype=DT_FLOAT, shape=[1,750,750], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

追溯显示它是由“＆＃39;标签”引起的。来自placeholder_inputs函数的张量。此外，据我所见，这个错误在两个占位符之间不断变换 - 随机。有一次，它是＆＃39;标签＆＃39; [labels_pl]张量，另一次，这是我的＆＃39;图像＆＃39; [images_pl]张量。

详细错误：

File ".../script.py", line 32, in placeholder_inputs
  shape=(batch_size, img_size, img_size), name='labels')
File ".../tensorflow/python/ops/array_ops.py", line 895, in placeholder
  name=name)
File ".../tensorflow/python/ops/gen_array_ops.py", line 1238, in _placeholder 
  name=name)
File ".../tensorflow/python/ops/op_def_library.py", line 704, in apply_op
  op_def=op_def)
File ".../tensorflow/python/framework/ops.py", line 2260, in create_op
  original_op=self._default_original_op, op_def=op_def)
File "/tensorflow/python/framework/ops.py", line 1230, in __init__
  self._traceback = _extract_stack()

我尝试/检查的内容：

将feed_dict放在for循环之外也无济于事。
已验证训练数据目录中有足够的数据与batch_size要求相对应。
指定占位符dtype的多种变体 - 假设＆＃39;浮动＆＃39;是堆栈跟踪的关键。
交叉检查数据形状。它们与占位符中的指定完全相同。

也许这比我想象的要简单得多。也许即使是一个小错字我也看不到这里。建议？我相信我已经筋疲力尽了。寻找有人为这个问题提供新的见解。

我已提到错误的this描述。

更新

在print feed_dict之前session.run（如此处的评论中所示）并注意到预期值正在输入占位符：

{<tf.Tensor 'images:0' shape=(1, 750, 750, 3) dtype=float32>:
array([[[[-0.1556225 , -0.13209309, -0.15954407],
     [-0.15954407, -0.12032838, -0.13601466],
     .....
     [-0.03405387,  0.04829907,  0.09535789]]]], dtype=float32),
 <tf.Tensor 'labels:0' shape=(1, 750, 750) dtype=float32>: 
 array([[[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       .....
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]]], dtype=float32)}

我之前没有提及的事情：循环第一次运行。因此，我得到step = 0的第一个值的输出，然后在打印我为loss_value指定的step=0语句后立即退出。

更新2：

summary_op

if step % 100 == 0:
    summary_str = sess.run(summary_op)
    summary_writer.add_summary(summary_str, step)

评论这个区块就可以了。关于为什么会出错的想法？

更新3：已解决

以下答案。我注意到的是，TensorFlow CIFAR-10 example执行类似的sess.run，没有明确提及feed_dict并且运行正常。它究竟是如何工作的呢？

Answer 1

明显的错误。我没有为feed_dict上的会话指定summary_op。

if step % 100 == 0:
    summary_str = sess.run(summary_op, feed_dict=feed_dict)
    summary_writer.add_summary(summary_str, step)

在会话运行中明确提到feed_dict调用就行了。但为什么？ TensorFlow CIFAR-10示例执行类似的sess.run，没有明确提及feed_dict并且运行正常。

TensorFlow中的feed_dict抛出意外错误（摘要操作）

1 个答案: