Tensorflow中的后台运行队列会导致奇怪的异常

时间:2016-06-13 19:32:19

标签: machine-learning tensorflow

我在Tensorflow中实现了这样的图形:有一个队列Q,后台线程将张量排入其中。在主线程中,我依次从Q中出列元素。

我的代码可简化如下:

app.js

我评论说,如果我在进行出列操作前睡了1秒,事情会好的。但是,如果立即运行,将引发以下异常:

import time
import threading
import tensorflow as tf

sess = tf.InteractiveSession()
coord = tf.train.Coordinator()

q = tf.FIFOQueue(32, dtypes=tf.int32)

def loop(g):
    with g.as_default():
        enqueue_op = q.enqueue(1, name="example_enqueue")

        for i in range(20):
            if coord.should_stop():
                return

            try:
                sess.run(enqueue_op)
            except tf.errors.CancelledError:
                print("enqueue canncelled")

threads = [
    threading.Thread(target=loop, args=(tf.get_default_graph(),))
]

sess.run(tf.initialize_all_variables())

for t in threads: t.start()

# If I sleep 1 seconds, it will be fine!
# time.sleep(1)

print(sess.run(q.dequeue()))

coord.request_stop()
coord.join(threads)

sess.close()

在处理上述异常期间,发生了另一个异常:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 715, in _do_call
    return fn(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 697, in _run_fn
    status, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/framework/errors.py", line 450, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found

在处理上述异常期间,发生了另一个异常:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/hanxu/Downloads/BrainSeg/playgrounds/7.py", line 32, in <module>
    print(sess.run(q.dequeue()))
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 372, in run
    run_metadata_ptr)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 636, in _run
    feed_dict_string, options, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 708, in _do_run
    target_list, options, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 728, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found
HanXus-MacBook-Pro:BrainSeg hanxu$ python3 -m playgrounds.7
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 715, in _do_call
    return fn(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 697, in _run_fn
    status, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/framework/errors.py", line 450, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found

有人可以帮忙吗?非常感谢!!

更新

我正在使用Tensorflow 9.0rc0。

我的实际情况有点复杂。事实上,排队的张量在每次都是不同的,比如

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/hanxu/Downloads/BrainSeg/playgrounds/7.py", line 34, in <module>
    print(sess.run(q.dequeue()))
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 372, in run
    run_metadata_ptr)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 636, in _run
    feed_dict_string, options, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 708, in _do_run
    target_list, options, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 728, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found

因此将入队操作移至主线程并非易事:(我不知道如何。请帮助:)。

1 个答案:

答案 0 :(得分:2)

这是an issue与TensorFlow的旧版本(0.9之前版本),版本0.9中为fixed。问题是,当其他线程(即您的q.dequeue()线程)使用图表时,向图表添加节点(即在您对q.enqueue()loop()的调用中)不是线程安全的。

您需要修复两个问题才能避免竞争条件(在0.9之前的版本中):

  1. 请勿在{{1​​}}主题中调用q.enqueue()。而是在主线程中创建它。例如:

    loop()
  2. 在您启动q = tf.FIFOQueue(32, dtypes=tf.int32) enqueue_op = q.enqueue(1, name="example_enqueue") def loop(g): for i in range(20): if coord.should_stop(): return try: sess.run(enqueue_op) except tf.errors.CancelledError: print("enqueue canncelled") 主题之前,将调用移至q.dequeue()(向图表添加节点):

    loop()