我使用Python的asyncio模块和async
/ await
同时处理块中的字符序列,并将结果收集到列表中。为此,我使用了一个chunker函数(split
)和一个块处理函数(process_chunk
)。它们都来自第三方库,我宁愿不改变它们。
分块很慢,并且前面不知道块的数量,这就是为什么我不想立刻使用整个块生成器。理想情况下,代码应该使生成器与process_chunk
的信号量同步,即每次该函数返回时。
我的代码
import asyncio
def split(sequence):
for x in sequence:
print('Getting the next chunk:', x)
yield x
print('Finished chunking')
async def process_chunk(chunk, *, semaphore=asyncio.Semaphore(2)):
async with semaphore:
print('Processing chunk:', chunk)
await asyncio.sleep(3)
return 'OK'
async def process_in_chunks(sequence):
gen = split(sequence)
coro = [process_chunk(chunk) for chunk in gen]
results = await asyncio.gather(*coro)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(process_in_chunks('ABC'))
有点作品和版画
Getting the next chunk: A
Getting the next chunk: B
Getting the next chunk: C
Finished chunking
Processing chunk: C
Processing chunk: B
Processing chunk: A
虽然这意味着在处理开始之前gen
生成器已耗尽。我知道为什么会这样,但是如何改变呢?
答案 0 :(得分:4)
如果您不介意有外部依赖,可以使用aiostream.stream.map:
from aiostream import stream, pipe
async def process_in_chunks(sequence):
# Asynchronous sequence of chunks
xs = stream.iterate(split(sequence))
# Asynchronous sequence of results
ys = xs | pipe.map(process_chunk, task_limit=2)
# Aggregation of the results into a list
zs = ys | pipe.list()
# Run the stream
results = await zs
print(results)
这些块被懒惰地生成并馈送到process_chunk
协程。同时运行的协同程序的数量由task_limit
控制。这意味着process_chunk
中的信号量不再是必需的。
输出:
Getting the next chunk: A
Processing chunk: A
Getting the next chunk: B
Processing chunk: B
# Pause 3 seconds
Getting the next chunk: C
Processing chunk: C
Finished chunking
# Pause 3 seconds
['OK', 'OK', 'OK']
请参阅此demonstration和documentation中的更多示例。
答案 1 :(得分:2)
next
手动迭代gen
import asyncio
# third-party:
def split(sequence):
for x in sequence:
print('Getting the next chunk:', x)
yield x
print('Finished chunking')
async def process_chunk(chunk, *, semaphore=asyncio.Semaphore(2)):
async with semaphore:
print('Processing chunk:', chunk)
await asyncio.sleep(3)
return 'OK'
# our code:
sem = asyncio.Semaphore(2) # let's use our semaphore
async def process_in_chunks(sequence):
tasks = []
gen = split(sequence)
while True:
await sem.acquire()
try:
chunk = next(gen)
except StopIteration:
break
else:
task = asyncio.ensure_future(process_chunk(chunk)) # task to run concurently
task.add_done_callback(lambda *_: sem.release()) # allow next chunks to be processed
tasks.append(task)
await asyncio.gather(*tasks, return_exceptions=True) # await all pending task
results = [task.result() for task in tasks]
return results
if __name__ == '__main__':
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(process_in_chunks('ABCDE'))
finally:
loop.run_until_complete(loop.shutdown_asyncgens())
loop.close()
输出:
Getting the next chunk: A
Getting the next chunk: B
Processing chunk: A
Processing chunk: B
Getting the next chunk: C
Getting the next chunk: D
Processing chunk: C
Processing chunk: D
Getting the next chunk: E
Finished chunking
Processing chunk: E