我有一个模仿REST API调用的代码(见下文)。
对于生成器项目中的每个键,它需要运行REST调用。所以在我的例子中,记录可能是
{"a": 2, "b": 36, "c": 77}
我需要分别为每个密钥(a
,b
和c
)运行REST调用,然后输出结果(这只是否定了数字):
{"a": 2, "a_neg": -2, "b": 36, "b_neg": -36, "c": 77, "c_neg": -77}
现在我的当前代码适用于一个键,但是有多个键,它会重复这些项目(所以我得到3个键的结果的三倍)。
此外还有一些时髦的竞争条件。我想我只能保留最后一条记录,但我对线程并不擅长并担心线程安全或其他高级内容。
以下是输出示例:
{'a': 89, 'a_neg': -89, 'b': 69, 'c': 38}
{'a': 89, 'a_neg': -89, 'b': 69, 'c': 38, 'c_neg': -38}
{'a': 89, 'a_neg': -89, 'b': 69, 'b_neg': -69, 'c': 38, 'c_neg': -38}
{'a': 90, 'a_neg': -90, 'b': 43, 'c': 16}
{'a': 90, 'a_neg': -90, 'b': 43, 'c': 16, 'c_neg': -16}
{'a': 90, 'a_neg': -90, 'b': 43, 'b_neg': -43, 'c': 16, 'c_neg': -16}
{'a': 91, 'a_neg': -91, 'b': 49, 'b_neg': -49, 'c': 77, 'c_neg': -77}
{'a': 91, 'a_neg': -91, 'b': 49, 'b_neg': -49, 'c': 77, 'c_neg': -77}
{'a': 91, 'a_neg': -91, 'b': 49, 'b_neg': -49, 'c': 77, 'c_neg': -77}
最后这是我的源代码(你可以自己运行):
#!/usr/bin/env python
from concurrent.futures import ThreadPoolExecutor
from time import sleep
from pprint import pprint
import random
def records():
# simulates records generator
for i in range(100):
yield {"a": i, "b": random.randint(0,100), "c": random.randint(0,100)}
def stream(records):
threads = 8
pool = ThreadPoolExecutor(threads)
def rest_api_lookup(record_dict):
# simulates REST call :)
sleep(0.1)
key = record_dict["key"]
record = record_dict["record"]
record[key + "_neg"] = -record[key]
return record
def thread(records):
chunk = []
for record in records:
for key in record:
chunk.append(pool.submit(rest_api_lookup, {"record": record, "key": key}))
if len(chunk) == threads:
yield chunk
chunk = []
if chunk:
yield chunk
def unchunk(chunk_gen):
"""Flattens a generator of Future chunks into a generator of Future results."""
for chunk in chunk_gen:
for f in chunk:
yield f.result() # get result from Future
# Now iterate over all results in same order as records
for result in unchunk(thread(records)):
#yield result
pprint(result)
stream(records())
答案 0 :(得分:1)
这里的第一个问题是你在一个增长的记录中循环键......
for key in list(record): # make a copy of the keys!
我认为第二个问题是你有3个密钥和8个线程...... len(chunk)
将是3, 6, 9
... threads
是8
- 以下条件未达成
if len(chunk) == threads: # try len(chunk) >= threads
yield chunk
chunk = []
最后一个问题是你在所有线程完成之前产生未完成的记录。这是一个可能的解决方案:
def unchunk(chunk_gen):
"""Flattens a generator of Future chunks into a generator of Future results."""
for chunk in chunk_gen:
old_res = None
for f in chunk:
res = f.result() # get result from Future
if old_res and res is not old_res:
yield old_res
old_res = res
if old_res:
yield old_res