为什么我的示例和标签顺序错误?

时间:2017-09-10 01:40:43

标签: python input tensorflow queue shuffle

我尝试在tensorflow中读取一些数据,然后将其与其标签进行匹配。我的设置如下:

  • 我有一系列英文字母"a", "b", "c", "d", "e", ...
  • 我有一个" cyrillic"字母"a", "b", "w, "g", "d", ...
  • 我有一组数字,0, 1, 2, 3, 4, ...

我想创建一个包含前两个数组之间对的示例队列,如["b", "b"], ["d", "g"], ["c", "w"], ...。我还想要一个相应数字的队列到这些对,在这种情况下将是1, 3, 2, ...

但是,当我生成这些队列时,我的示例和我的数字不匹配 - 例如,["b", "b"], ["d", "g"], ["c", "w"], ...的队列与5, 0, 2, ...的标签队列一起出现。

可能导致这种情况的原因是什么?为了进行测试,我已禁用队列/批处理生成中的所有混洗,但问题仍然存在。



这是我的代码:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import tensorflow as tf

from constants import FLAGS


letters_data = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]
cyrillic_letters_data = ["a", "b", "w", "g", "d", "e", "j", "v", "z", "i"]
numbers_data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]




def inputs(batch_size):
    # Get the letters and the labels
    (letters, labels) = _inputs(batch_size=batch_size)

    # Return the letters and the labels.
    return letters, labels


def read_letter(pairs_and_overlap_queue):
    # Read the letters, cyrillics, and numbers.
    letter = pairs_and_overlap_queue[0]
    cyrillic = pairs_and_overlap_queue[1]
    number = pairs_and_overlap_queue[2]

    # Do something with them
    # (doesn't matter what)
    letter = tf.substr(letter, 0, 1)
    cyrillic = tf.substr(cyrillic, 0, 1)
    number = tf.add(number, tf.constant(0))

    # Return them
    return letter, cyrillic, number


def _inputs(batch_size):
    # Get the input data
    letters = letters_data
    cyrillics = cyrillic_letters_data
    numbers = numbers_data


    # Create a queue containing the letters,
    # the cyrillics, and the numbers
    pairs_and_overlap_queue = tf.train.slice_input_producer([letters, cyrillics, numbers],
                                                            capacity=100000,
                                                            shuffle=False)

    # Perform some operations on each of those
    letter, cyrillic, number = read_letter(pairs_and_overlap_queue)

    # Combine the letters and cyrillics into one example
    combined_example = tf.stack([letter, cyrillic])


    # Ensure that the random shuffling has good mixing properties.
    min_fraction_of_examples_in_queue = 0.4
    min_queue_examples = int(FLAGS.NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *
                             min_fraction_of_examples_in_queue)

    # Generate an example and label batch, and return it.
    return _generate_image_and_label_batch(example=combined_example, label=number,
                                           min_queue_examples=min_queue_examples,
                                           batch_size=batch_size,
                                           shuffle=False)


def _generate_image_and_label_batch(example, label, min_queue_examples,
                                    batch_size, shuffle):

    # Create a queue that shuffles the examples, and then
    # read 'batch_size' examples + labels from the example queue.
    num_preprocess_threads = FLAGS.NUM_THREADS
    if shuffle:
        examples, label_batch = tf.train.shuffle_batch(
            [example, label],
            batch_size=batch_size,
            num_threads=num_preprocess_threads,
            capacity=min_queue_examples + 6 * batch_size,
            min_after_dequeue=min_queue_examples)
    else:
        print("Not shuffling!")
        examples, label_batch = tf.train.batch(
            [example, label],
            batch_size=batch_size,
            num_threads=num_preprocess_threads,
            capacity=min_queue_examples + 6 * batch_size)

    # Return the examples and the labels batches.
    return examples, tf.reshape(label_batch, [batch_size])



lcs, nums = inputs(batch_size=3)



with tf.Session() as sess:

    # Start populating the filename queue.
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord, sess=sess)


    for i in xrange(0, 5):
        my_lcs = lcs.eval()
        my_nums = nums.eval()

        print(str(my_lcs) + " --> " + str(my_nums))

非常感谢你的帮助!

1 个答案:

答案 0 :(得分:4)

当您运行tv.eval()两次时,您实际上运行了两次图表,因此您将两个不同批次中的lcs和nums混合在一起,如果您将循环更改为以下内容,则会在以下期间拉出两个张量同样的图表运行:

    my_lcs, my_nums = sess.run([lcs, nums])

    print(str(my_lcs) + " --> " + str(my_nums))

这在我身边:

[[b'g' b'j']
 [b'h' b'v']
 [b'i' b'z']] --> [6 7 8]
[[b'f' b'e']
 [b'g' b'j']
 [b'h' b'v']] --> [5 6 7]
[[b'e' b'd']
 [b'f' b'e']
 [b'g' b'j']] --> [4 5 6]
[[b'd' b'g']
 [b'e' b'd']
 [b'f' b'e']] --> [3 4 5]
[[b'c' b'w']
 [b'd' b'g']
 [b'e' b'd']] --> [2 3 4]

另见以下帖子:  Does Tensorflow rerun for each eval() call?