训练张量流im2txt失败,截断记录为

时间:2016-10-20 09:06:24

标签: python tensorflow

im2txt训练几千步然后停止,出现以下错误。 我检查了培训文件,看起来没问题。

在Ubuntu 16.04,TF r.0.11,GPU模式GTX 970 4Gb上运行。

不确定是否缺少RAM?

INFO:tensorflow:global step 56396: loss = 2.4654 (0.41 sec/step)
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors.DataLossError'>, truncated record at 369740238
     [[Node: ReaderRead = ReaderRead[_class=["loc:@TFRecordReader", "loc:@filename_queue"], _device="/job:localhost/replica:0/task:0/cpu:0"](TFRecordReader, filename_queue)]]

Caused by op u'ReaderRead', defined at:
  File "/home/john/Developer/tensorflow/tensorflow/models/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/train.py", line 114, in <module>
    tf.app.run()
  File "/home/john/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "/home/john/Developer/tensorflow/tensorflow/models/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/train.py", line 65, in main
    model.build()
  File "/home/john/Developer/tensorflow/tensorflow/models/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/show_and_tell_model.py", line 352, in build
    self.build_inputs()
  File "/home/john/Developer/tensorflow/tensorflow/models/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/show_and_tell_model.py", line 153, in build_inputs
    num_reader_threads=self.config.num_input_reader_threads)
  File "/home/john/Developer/tensorflow/tensorflow/models/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/ops/inputs.py", line 115, in prefetch_input_data
    _, value = reader.read(filename_queue)
  File "/home/john/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/io_ops.py", line 277, in read
    return gen_io_ops._reader_read(self._reader_ref, queue_ref, name=name)
  File "/home/john/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 211, in _reader_read
    queue_handle=queue_handle, name=name)
  File "/home/john/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 748, in apply_op
    op_def=op_def)
  File "/home/john/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2403, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/john/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1305, in __init__
    self._traceback = _extract_stack()

DataLossError (see above for traceback): truncated record at 369740238
     [[Node: ReaderRead = ReaderRead[_class=["loc:@TFRecordReader", "loc:@filename_queue"], _device="/job:localhost/replica:0/task:0/cpu:0"](TFRecordReader, filename_queue)]]

INFO:tensorflow:global step 56397: loss = 2.5540 (0.40 sec/step)

1 个答案:

答案 0 :(得分:0)

我有同样的问题,不知道为什么。创建tfrecords时我没有看到任何错误。在培训期间,错误在记录结束时出现。顺便说一下,我使用的是0.11rc