Question

我目前正在尝试使用tensorflow队列编写Tensorflow数据输入管道。我的数据由jpg图像，三个通道（RGB）组成，并且是128x128像素。

我当前的问题正在运行我的image_batch操作，因为该操作一直暂停，我不确定为什么。

以下是用于构建输入管道的代码。

我正在使用三个主要功能：

read_my_file_format接受filename_queue并尝试加载文件并调整其大小
tensorflow_queue获取对象列表并生成张量流FIFO队列。然后将队列添加到队列运行器，并添加到tf.train.add_queue_runner
shuffle_queue_batch旨在返回获取一批图像和标签的操作。

下面是我的代码。

def read_my_file_format(filename_queue):
   reader = tf.WholeFileReader()
   filename, image_string = reader.read(filename_queue)
   image = tf.image.decode_jpeg(image_string, channels=3)
   image = tf.image.resize_images(image, size=[256, 256])
   return image

def tensorflow_queue(lst, dtype, capacity=32):
    tensor = tf.convert_to_tensor(lst, dtype=dtype)
    fq = tf.FIFOQueue(capacity=capacity, dtypes=dtype, shapes=(()))
    fq_enqueue_op = fq.enqueue_many([tensor])
    tf.train.add_queue_runner(tf.train.QueueRunner(fq, [fq_enqueue_op]*1))
    return fq

def shuffle_queue_batch(image, label, batch_size, capacity=32, min_after_dequeue=10, threads=1):
    tensor_list = [image, label]
    dtypes = [tf.float32, tf.int32]
    shapes = [image.get_shape(), label.get_shape()]
    rand_shuff_queue = tf.RandomShuffleQueue(
                                capacity=capacity,
                                min_after_dequeue=min_after_dequeue,
                                dtypes=dtypes,
                                shapes=shapes
                                )
    rand_shuff_enqueue_op = rand_shuff_queue.enqueue(tensor_list)
    tf.train.add_queue_runner(tf.train.QueueRunner(rand_shuff_queue, [rand_shuff_enqueue_op] * threads))

    image_batch, label_batch = rand_shuff_queue.dequeue_many(batch_size)
    return image_batch, label_batch

def input_pipeline(filenames, classes, min_after_dequeue=10):
    filename_queue = tf.train.string_input_producer(filenames, shuffle=False)
    classes_queue = tensorflow_queue(classes, tf.int32)
    image = read_my_file_format(filename_queue)
    label = classes_queue.dequeue()
    image_batch, label_batch = shuffle_queue_batch(image, label, BATCH_SIZE, min_after_dequeue=min_after_dequeue)

    return image_batch, label_batch


with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    # get_image_data returns:
    #    filenames is a list of strings of the filenames
    #    classes is a list of ints
    #    datasize = number of images in dataset
    filenames, classes, datasize = get_image_data()


    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    image_batch, label_batch = input_pipeline(filenames, classes)

    print('Starting training')
    for ep in range(NUM_EPOCHS):
        total_loss = 0
        for _ in range(datasize // BATCH_SIZE * BATCH_SIZE):
            print('fetching batch')
            x_batch = sess.run([image_batch])
            print('x batch')
            y_batch = sess.run([label_batch])
            x_batch, y_batch = sess.run([image_batch, label_batch])

先谢谢您。

Answer 1

我强烈建议您将输入管道从tf.train队列切换到tf.data。队列输入管道效率低下且难以维护。

Answer 2

您的代码大部分是正确的。只需稍作更改，即可使代码正常工作。您的代码无法正常工作的原因是，您在声明队列之前启动了队列运行器。如果您查看start_queue_runners的返回值，则会发现列表为空。

话虽如此，亚历山大的建议还是不错的。 tf.Data是获得高性能输入管道的方法。同样，队列运行器与新的TF Eager机制不兼容。

Tensorflow输入管道用于生成批处理的问题

2 个答案: