Question

在训练一个非常标准的卷积网络时，我发现了一个奇怪的错误。一切都以良好的损失曲线开始，但突然损失降到了零。我能够将nans一直追溯到输入管道。

如您所见，我正在使用tf.train.shuffle_batch（）批处理之前和之后打印错误。第二个印刷品以nan出现，并在问题中一直传播。

可能是什么原因造成的？我玩过不同的容量，线程等值。

代码和上下文如下。 Nans出现在批处理的前/后图像中，但不出现在前/后图像中。

我应该注意tfrecord文件中包含任意数量的示例，但是我认为这对于入队/出队操作无关紧要。

def input_pipeline(self, filenames, batch_size, num_epochs=None):
    """Function that creates a highly abstracted input pipeline consisting
    of a bunch of threads and queues given a few simple parameters.

    See https://www.tensorflow.org/versions/r0.10/how_tos/reading_data/index.html#multiple-input-pipelines
    for more information and in-depth explanations.

    Args:
        - filenames: a list of filenames of tfrecords files
    random.shuffle(filenames)

    train_filenames = filenames
    train_filename_queue = (
        tf.train.string_input_producer(train_filenames,
                                       num_epochs=num_epochs,
                                       shuffle=True,
                                       seed=1))

    before_image, after_image, mask_image = (
        self._read_and_decode_reach_tfrecords(train_filename_queue))

    # min_after_dequeue defines how big a buffer we will randomly sample
    #   from -- bigger means better shuffling but slower start up and more
    #   memory used.
    # capacity must be larger than min_after_dequeue and the amount larger
    #   determines the maximum we will prefetch.  Recommendation:
    #   min_after_dequeue + (num_threads + a safety margin) * batch_size
    min_after_dequeue = 1000
    saftey_margin = 3
    capacity = min_after_dequeue + (3 + saftey_margin) * batch_size
    capacity = 2000

    before_image = tf.Print(before_image,[tf.reduce_mean(before_image + after_image)], "pre_shuffle: ")
    mask_image = tf.Print(mask_image, [tf.reduce_mean(mask_image)], "pre_shuffle_mask: ")

    before_images, after_images, mask_images = (
        tf.train.shuffle_batch(
            [before_image, after_image, mask_image], batch_size=batch_size, 
            capacity=capacity, min_after_dequeue=min_after_dequeue, 
            num_threads=5, seed=1))

    before_images = tf.Print(before_images,[tf.reduce_mean(before_images + after_images)], "post_shuffle: ")

tf.train.shuffle_batch在随机迭代后返回nan'

0 个答案: