我想构建一个项目,在任意时间将请求放入python Queue,一组tensorflow模型使用队列中的请求,并立即返回结果。
模型有不同的线程,不同的tf.Graph,但结构和重量值是相同的。
每个模型都使用tf.data.Dataset.from_generator来封装一个从队列中获取请求的python迭代器。
问题是,当有多个模型时,请求可能会被阻止,直到将来的请求到来。从测试结果来看,似乎python迭代器确实在它被放入队列时得到了请求,但没有结果来自模型。此外,似乎没有丢弃请求,但可能被tf数据集迭代器阻止。
这是我的测试代码:
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
import sys
import random
import time
from queue import Queue
from concurrent.futures import ThreadPoolExecutor
thread_count=int(sys.argv[1])
request_queue=Queue(128)
def data_iter():
while True:
yield request_queue.get()
def task():
with tf.Graph().as_default():
ds=tf.data.Dataset.from_generator(data_iter, (tf.int32), output_shapes=([1, 8]))
sample=ds.make_one_shot_iterator().get_next()
with tf.Session() as sess:
coord=tf.train.Coordinator()
threads=tf.train.start_queue_runners(sess=sess, coord=coord)
while not coord.should_stop():
try:
result=sess.run(sample)
print(result)
except:
coord.request_stop()
coord.join(threads)
executor=ThreadPoolExecutor(thread_count)
try:
for i in range(thread_count):
executor.submit(task)
rand=random.Random()
for i in range(100):
request_queue.put(np.full((1, 8), i, 'int32'))
time.sleep(1e-3)#to let the model get request from the request_queue
t=rand.randint(5,10)
print('round {}, request_queue size is about {}, sleeping {} secs...'.format(i, request_queue.qsize(), t))
time.sleep(t)
finally:
for i in range(thread_count):
request_queue.put(None)
executor.shutdown()
环境:python 3.5.3,tensorflow 1.4.0
测试结果:
python tf_ds_test.py 1
结果如下:
round 0, request_queue size is about 1, sleeping 6 secs...
2017-12-21 10:42:24.924251: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
[[0 0 0 0 0 0 0 0]]
[[1 1 1 1 1 1 1 1]]
round 1, request_queue size is about 0, sleeping 6 secs...
[[2 2 2 2 2 2 2 2]]
round 2, request_queue size is about 0, sleeping 5 secs...
[[3 3 3 3 3 3 3 3]]
round 3, request_queue size is about 0, sleeping 7 secs...
[[4 4 4 4 4 4 4 4]]
round 4, request_queue size is about 0, sleeping 6 secs...
[[5 5 5 5 5 5 5 5]]
round 5, request_queue size is about 0, sleeping 7 secs...
...
一切顺利。
python tf_ds_test.py 32
结果如下:
2017-12-21 10:45:41.660251: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
round 0, request_queue size is about 1, sleeping 9 secs...
[[0 0 0 0 0 0 0 0]]
[[1 1 1 1 1 1 1 1]]
round 1, request_queue size is about 0, sleeping 5 secs...
round 2, request_queue size is about 0, sleeping 8 secs...
round 3, request_queue size is about 0, sleeping 10 secs...
[[4 4 4 4 4 4 4 4]]
[[2 2 2 2 2 2 2 2]]
[[3 3 3 3 3 3 3 3]]
round 4, request_queue size is about 0, sleeping 8 secs...
round 5, request_queue size is about 0, sleeping 6 secs...
round 6, request_queue size is about 0, sleeping 10 secs...
[[6 6 6 6 6 6 6 6]]
[[5 5 5 5 5 5 5 5]]
round 7, request_queue size is about 0, sleeping 9 secs...
[[7 7 7 7 7 7 7 7]]
round 8, request_queue size is about 0, sleeping 5 secs...
round 9, request_queue size is about 0, sleeping 10 secs...
round 10, request_queue size is about 0, sleeping 6 secs...
round 11, request_queue size is about 0, sleeping 10 secs...
[[8 8 8 8 8 8 8 8]]
round 12, request_queue size is about 0, sleeping 8 secs...
请求已被阻止! python迭代器立即消耗了请求,但模型在任意时间段之前都没有给出结果,直到模型得到它的下一个请求。
有人有任何想法吗?如何让这些模型立即返回结果?
答案 0 :(得分:0)
您是否可以将生成元素的循环修改为:
for i in range(100):
request_queue.put(np.full((1, 8), i, 'int32'))
print('round {}, queue size {}'.format(i, request_queue.qsize()))
并分享输出?
我尝试重现你的问题(使用TF的每晚构建),但即使有1000个任务和10000次迭代循环,事情仍然顺利进行。
你能用TF的夜间版本试试这个吗?