即使启动队列运行器,Tensorflow会话run()也会挂起,为什么?

时间:2018-09-06 10:21:23

标签: tensorflow

我的程序尝试从磁盘读取tfrecords文件。我使用了tensorflow的队列API,但是它挂在运行的在线会话上,我不知道为什么,我已经启动了队列运行程序。我的程序如下:

batch_size = 512
def decode_tfr(filename, train=True):
    if type(filename) != list:
        filename = [filename]
    shuffle = True if train else False
    capacity = 16 if train else 1
    num_epochs = None if train else 1
    filename_queue = tf.train.string_input_producer(filename, shuffle=shuffle,
                                                num_epochs=num_epochs,
                                                capacity=capacity)
    if train:
        examples_queue = tf.RandomShuffleQueue(
        capacity=batch_size * 8,
        min_after_dequeue=batch_size * 2,
        dtypes=[tf.string])
    else:
        examples_queue = tf.FIFOQueue(
        capacity=batch_size * 8,
        dtypes=[tf.string])

    enqueue_ops = []
    num_readers=1
    for _ in range(num_readers):
        reader = tf.TFRecordReader()
        _, value = reader.read(filename_queue)
        enqueue_ops.append(examples_queue.enqueue([value]))

    tf.train.queue_runner.add_queue_runner(
    tf.train.queue_runner.QueueRunner(examples_queue, enqueue_ops))
    example_serialized = examples_queue.dequeue()

    num_preprocess_threads = 10
    items = []
    for thread_id in range(num_preprocess_threads):
        record = parse_example_proto(example_serialized)
        items.append(list(record))

    record = tf.train.batch_join(
        items,
        batch_size=batch_size,
        dynamic_pad=True,
        capacity=2 * num_preprocess_threads * batch_size)
    return record

然后在主块中,我的代码如下:

filename = './test.tfr'

with tf.device('/cpu:0'):
    items = decode_tfr(filename)

config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

init = tf.global_variables_initializer()
sess.run(init)
sess.run(tf.local_variables_initializer())

coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord, start=True)
print("start...")
try:
    while not coord.should_stop():
        for i in xrange(1):
            print("before run...")
            print(items[0])
            a = sess.run(items[0])
            print("after run...")
        break
except Exception:
    print("Done!")

coord.request_stop()
coord.join(threads)
quit()

运行程序时,它挂在以下行:sess.run() 我不明白为什么挂了? 因为我已经按照tensorflow网站上的说明启动了queue_runners。 非常感谢。

我使用top命令,发现此过程的cpu实用程序在300%以上。

1 个答案:

答案 0 :(得分:0)

最后我自己找到了。 test.tfr文件为空,因此sess.run()永远挂起!