使用TensorFlow中的队列从文本文件

时间:2017-01-05 11:22:25

标签: python python-3.x queue tensorflow neural-network

我正在尝试在TensorFlow中运行一个非常简单的神经网络,它将学习对图像进行分类。到目前为止它非常简单,因为我还在学习框架。

到目前为止,我正在努力加载数据 - 我的数据在TXT文件中。每行包含照片的ID和用作标签的二进制数。

到目前为止,这是我的代码(我删除了不相关的部分):

import tensorflow as tf

IMAGE_WIDTH = 240
IMAGE_HEIGHT = 180
NUMBER_OF_CHANNELS = 3
SOURCE_DIR = './data/'
TRAINING_IMAGES_DIR = SOURCE_DIR + 'train/'
LIST_FILE_NAME = 'list.txt'
BATCH_SIZE = 100
TRAINING_SET_SIZE = 15873

def create_photo_and_label_batches(source_directory):
  # read the list of photo IDs and labels
  photos_list = open(source_directory + LIST_FILE_NAME, 'r')
  filenames_list = []
  labels_list = []
  # get lists of photo file names and labels
  for line in photos_list:
    filenames_list.append(source_directory + line.split(',')[0] + '.jpg')
    labels_list.append([bool(line.split(',')[1])])
  # convert the lists to tensors
  filenames = tf.convert_to_tensor(filenames_list, dtype=tf.string)
  labels = tf.convert_to_tensor(labels_list, dtype=tf.bool)
  # create queue with filenames and labels
  file_names_queue, labels_queue = 
     tf.train.slice_input_producer([filenames, labels], num_epochs=1, shuffle=True)
  # convert filenames of photos to input vectors
  photos_queue = tf.read_file(file_names_queue)  # convert filenames to content
  photos_queue = tf.image.decode_jpeg(photos_queue, channels=NUMBER_OF_CHANNELS)
  photos_queue.set_shape([IMAGE_HEIGHT, IMAGE_WIDTH, NUMBER_OF_CHANNELS])
  photos_queue = tf.to_float(photos_queue)  # convert uint8 to float32
  photos_queue = tf.reshape(photos_queue, [-1]) # flatten the tensor
  # slice the data into mini batches
  return tf.train.batch([photos_queue, labels_queue], batch_size=BATCH_SIZE)

def main(_):
  # load the training set
  training_photo_batch, training_label_batch = 
      create_photo_and_label_batches(TRAINING_IMAGES_DIR)

  # create the model
  x = training_photo_batch
  W = tf.Variable(tf.zeros([IMAGE_WIDTH * IMAGE_HEIGHT * NUMBER_OF_CHANNELS, 1],
     dtype=tf.float32))  # weights tensor
  b = tf.Variable(tf.zeros([1], dtype=tf.float32))  # bias
  y_ = training_label_batch
  y = tf.matmul(x, W) + b

  # define loss and optimizer
  cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_))
  train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

  # do the training
  sess = tf.InteractiveSession()
  tf.initialize_all_variables().run()
  coord = tf.train.Coordinator()
  threads = tf.train.start_queue_runners(coord=coord)
  for i in range(TRAINING_SET_SIZE // BATCH_SIZE):
    sess.run(train_step)

  # stop the queue threads and properly close the session
  coord.request_stop()
  coord.join(threads)
  sess.close()

如您所见,网络非常简单,只有一个神经元。我受到这里列出的代码的启发:Tensorflow read images with labels运行代码后,我在第一次迭代时遇到以下错误:

tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_batch/fifo_queue' is closed and has insufficient elements (requested 100, current size 0)
 [[Node: batch = QueueDequeueMany[_class=["loc:@batch/fifo_queue"], component_types=[DT_FLOAT, DT_BOOL], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/fifo_queue, batch/n)]]

我已经尝试将问题解决了几个小时了。到目前为止,我检查过:

  • filenames_list和labels_list已正确加载
  • 张量的形状(x,y,y_,W和b)是正确的
  • TensorFlow图已正确构建并在TensorBoard中可见。

不知道我还应该检查什么。似乎我对TensorFlow中的队列不了解,但我不知道到底是什么。在此先感谢您的帮助!

1 个答案:

答案 0 :(得分:2)

这可能是由num_epochs=1 tf.train.slice_input_producer([filenames, labels], num_epochs=1, shuffle=True)引起的。你可以检查slice_input_producer的api,它解释了:num_epochs:一个整数(可选)。如果指定,slice_input_producer会在生成OutOfRange错误之前生成每个切片num_epochs次。