我目前正在尝试使用tensorflow队列编写Tensorflow数据输入管道。我的数据由jpg图像,三个通道(RGB)组成,并且是128x128像素。
我当前的问题正在运行我的image_batch操作,因为该操作一直暂停,我不确定为什么。
以下是用于构建输入管道的代码。
我正在使用三个主要功能:
read_my_file_format
接受filename_queue并尝试加载文件并调整其大小 tensorflow_queue
获取对象列表并生成张量流FIFO队列。然后将队列添加到队列运行器,并添加到tf.train.add_queue_runner
shuffle_queue_batch
旨在返回获取一批图像和标签的操作。
下面是我的代码。
def read_my_file_format(filename_queue):
reader = tf.WholeFileReader()
filename, image_string = reader.read(filename_queue)
image = tf.image.decode_jpeg(image_string, channels=3)
image = tf.image.resize_images(image, size=[256, 256])
return image
def tensorflow_queue(lst, dtype, capacity=32):
tensor = tf.convert_to_tensor(lst, dtype=dtype)
fq = tf.FIFOQueue(capacity=capacity, dtypes=dtype, shapes=(()))
fq_enqueue_op = fq.enqueue_many([tensor])
tf.train.add_queue_runner(tf.train.QueueRunner(fq, [fq_enqueue_op]*1))
return fq
def shuffle_queue_batch(image, label, batch_size, capacity=32, min_after_dequeue=10, threads=1):
tensor_list = [image, label]
dtypes = [tf.float32, tf.int32]
shapes = [image.get_shape(), label.get_shape()]
rand_shuff_queue = tf.RandomShuffleQueue(
capacity=capacity,
min_after_dequeue=min_after_dequeue,
dtypes=dtypes,
shapes=shapes
)
rand_shuff_enqueue_op = rand_shuff_queue.enqueue(tensor_list)
tf.train.add_queue_runner(tf.train.QueueRunner(rand_shuff_queue, [rand_shuff_enqueue_op] * threads))
image_batch, label_batch = rand_shuff_queue.dequeue_many(batch_size)
return image_batch, label_batch
def input_pipeline(filenames, classes, min_after_dequeue=10):
filename_queue = tf.train.string_input_producer(filenames, shuffle=False)
classes_queue = tensorflow_queue(classes, tf.int32)
image = read_my_file_format(filename_queue)
label = classes_queue.dequeue()
image_batch, label_batch = shuffle_queue_batch(image, label, BATCH_SIZE, min_after_dequeue=min_after_dequeue)
return image_batch, label_batch
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# get_image_data returns:
# filenames is a list of strings of the filenames
# classes is a list of ints
# datasize = number of images in dataset
filenames, classes, datasize = get_image_data()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
image_batch, label_batch = input_pipeline(filenames, classes)
print('Starting training')
for ep in range(NUM_EPOCHS):
total_loss = 0
for _ in range(datasize // BATCH_SIZE * BATCH_SIZE):
print('fetching batch')
x_batch = sess.run([image_batch])
print('x batch')
y_batch = sess.run([label_batch])
x_batch, y_batch = sess.run([image_batch, label_batch])
先谢谢您。
答案 0 :(得分:0)
我强烈建议您将输入管道从tf.train队列切换到tf.data。队列输入管道效率低下且难以维护。
答案 1 :(得分:0)
您的代码大部分是正确的。只需稍作更改,即可使代码正常工作。您的代码无法正常工作的原因是,您在声明队列之前启动了队列运行器。如果您查看start_queue_runners
的返回值,则会发现列表为空。
话虽如此,亚历山大的建议还是不错的。 tf.Data
是获得高性能输入管道的方法。同样,队列运行器与新的TF Eager机制不兼容。
相关代码如下:
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# get_image_data returns:
# filenames is a list of strings of the filenames
# classes is a list of ints
# datasize = number of images in dataset
filenames, classes, datasize = get_image_data()
image_batch, label_batch = input_pipeline(filenames, classes)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
#image_batch, label_batch = input_pipeline(filenames, classes)
print('Starting training')
for ep in range(NUM_EPOCHS):
total_loss = 0
for _ in range(datasize // BATCH_SIZE * BATCH_SIZE):
print('fetching batch')
x_batch = sess.run([image_batch])
print('x batch')
y_batch = sess.run([label_batch])
x_batch, y_batch = sess.run([image_batch, label_batch])