Tensorflow批量训练OutOfRangeError

时间:2017-06-09 14:10:17

标签: tensorflow training-data outofrangeexception

Saving variables
Variables saved in 0.88 seconds
Saving metagraph
Metagraph saved in 35.81 seconds
Saving variables
Variables saved in 0.95 seconds
Saving metagraph
Metagraph saved in 33.20 seconds
Traceback (most recent call last):
Caused by op u'batch', defined at:
  File "ava_train.py", line 155, in <module>
    image_batch, label_batch = tf.train.batch([image, label], batch_size=batch_size, allow_smaller_final_batch=True)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 872, in batch
name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 665, in _batch
dequeued = queue.dequeue_up_to(batch_size, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 510, in dequeue_up_to
self._queue_ref, n=n, component_types=self._dtypes, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1402, in _queue_dequeue_up_to_v2
timeout_ms=timeout_ms, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1264, in __init__
self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): FIFOQueue '_1_batch/fifo_queue' is closed and has insufficient elements (requested 100, current size 0)
     [[Node: batch = QueueDequeueUpToV2[component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/fifo_queue, batch/n)]]
  

我的代码在这里

with tf.Graph().as_default():
     global_step = tf.Variable(0, trainable=False)
     # process same as cifar10.distorted_inputs
     log_dir =  '../log'
     model_dir = '../model'
     max_num_epoch = 80
     if not os.path.exists(log_dir):
         os.makedirs(log_dir)
     if not os.path.exists(model_dir):
         os.makedirs(model_dir)
     num_train_example = len(os.listdir('../images/'))
     # Reads pfathes of images together with their labels
     image_list, label_list = read_labeled_image_list('../raw.txt')
     images = ops.convert_to_tensor(image_list, dtype=dtypes.string)
     labels = ops.convert_to_tensor(label_list, dtype=dtypes.int32)
     # Makes an input queue
     # input_queue = tf.train.slice_input_producer([images, labels], num_epochs=max_num_epoch, shuffle=True)
     input_queue = tf.train.slice_input_producer([images, labels], shuffle=True)
     image, label = read_images_from_disk(input_queue)
     image_size = 240
     keep_probability = 0.8
     weight_decay = 5e-5
     image = preprocess(image, image_size, image_size, None)
     batch_size = 100
     epoch_size = 1000
     embedding_size = 128
     # Optional Image and Label Batching
     image_batch, label_batch = tf.train.batch([image, label], batch_size=batch_size, allow_smaller_final_batch=True)

这是基于20w图像训练图像分类模型的输出。我在批次中设置 allow_smaller_final_batch = True 。在一些时期之后发生 OutOfRangeError

我不知道原因并感谢您的帮助。

2 个答案:

答案 0 :(得分:0)

由于您获得了OutOfRangeError,因此您可能正在训练的时间段超过max_num_epochs,这将导致slice_input_producer抛出此异常。

一种可能的解决方法是从num_epochs=max_num_epochs中删除slice_input_producer,因为这样即使在达到最大纪元数后也可以生成。{/ p>

答案 1 :(得分:0)

我已经为这个特殊的错误而斗争了好几天。我终于找到了原因。您收到此错误是因为文件在某处已损坏。尝试在另一列火车上运行此代码并测试数据