Question

我正在尝试使用Tensorflow编写自己的MNIST数字分类器，并且我遇到了tf.train.shuffle_batch函数的奇怪行为。

当我尝试从不同文件加载图像和标签时出现问题，随机批处理似乎会自动混洗标签和图像，从而产生错误的标记数据。数据来自here

是shuffle_batch函数的定义行为吗？当数据和标签是不同的文件时，您会如何建议处理这种情况？

这是我的代码

DATA = 'train-images.idx3-ubyte'
LABELS = 'train-labels.idx1-ubyte'
data_queue = tf.train.string_input_producer([DATA,])
label_queue = tf.train.string_input_producer([LABELS,])

NUM_EPOCHS = 2
BATCH_SIZE = 10

reader_data = tf.FixedLengthRecordReader(record_bytes=28*28, header_bytes = 16)
reader_labels = tf.FixedLengthRecordReader(record_bytes=1, header_bytes = 8)

(_,data_rec) = reader_data.read(data_queue)
(_,label_rec) = reader_labels.read(label_queue)

image = tf.decode_raw(data_rec, tf.uint8)
image = tf.reshape(image, [28, 28, 1])
label = tf.decode_raw(label_rec, tf.uint8)
label = tf.reshape(label, [1])


image_batch, label_batch = tf.train.shuffle_batch([image, label],
                                                 batch_size=BATCH_SIZE,
                                                 capacity=100,
                                                 min_after_dequeue = 30)


sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)

image = image_batch[1]
im = image.eval()
print("im_batch shape :" + str(image_batch.get_shape().as_list()))
print("label shape :" + str(label_batch.get_shape().as_list()))
print("label is :" + str(label_batch[1].eval()))
# print("output is :" + str(conv1.eval()))

plt.imshow(np.reshape(im, [-1, 28]), cmap='gray')
plt.show()
coord.request_stop()
coord.join(threads)

Answer 1

我认为问题出现是因为您在单独的Tensor.eval()来电中评估了image和label_batch[1]。这意味着您将从两个不同的批次中获取值。相反，如果你写：

im, lbl = sess.run([image_batch[1], label_batch[1]])

...您应该从同一批次中获得匹配的图像和标签。

来自多个文件的Tensorflow shuffle_batch打破了标签

1 个答案: