我正在工作一个示例程序,该程序从数据源(csv或rdbms)中分块读取数据,进行一些转换并将其通过套接字发送到服务器。
但是,由于csv非常大,出于测试目的,我想在几个块之后打破读数。 不幸的是出了点问题,我不知道该怎么办以及如何解决。可能我必须取消一些,但现在确定在哪里以及如何进行。我收到以下错误:
Task was destroyed but it is pending!
task: <Task pending coro=<<async_generator_athrow without __name__>()>>
示例代码为:
import asyncio
import json
async def readChunks():
# this is basically a dummy alternative for reading csv in chunks
df = [{"chunk_" + str(x) : [r for r in range(10)]} for x in range(10)]
for chunk in df:
await asyncio.sleep(0.001)
yield chunk
async def send(row):
j = json.dumps(row)
print(f"to be sent: {j}")
await asyncio.sleep(0.001)
async def main():
i = 0
async for chunk in readChunks():
for k, v in chunk.items():
await asyncio.gather(send({k:v}))
i += 1
if i > 5:
break
#print(f"item in main via async generator is {chunk}")
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
答案 0 :(得分:1)
许多async
资源(例如生成器)需要借助事件循环来清理。当async for
循环停止通过break
迭代异步生成器时,该生成器仅由垃圾收集器清除。这意味着任务处于待处理状态(等待事件循环),但被销毁(被垃圾收集器破坏)。
最直接的解决方法是明确aclose
生成器:
async def main():
i = 0
aiter = readChunks() # name iterator in order to ...
try:
async for chunk in aiter:
...
i += 1
if i > 5:
break
finally:
await aiter.aclose() # ... clean it up when done
可以使用asyncstdlib
简化这些模式(免责声明:我维护此库)。 asyncstdlib.islice
允许在清洁关闭发生器之前取出固定数量的物品:
import asyncstdlib as a
async def main():
async for chunk in a.islice(readChunks(), 5):
...
如果break
条件是动态的,则scoping the iterator保证在任何情况下都可以清除:
import asyncstdlib as a
async def main():
async with a.scoped_iter(readChunks()) as aiter:
async for idx, chunk in a.enumerate(aiter):
...
if idx >= 5:
break
答案 1 :(得分:1)
这有效...
import asyncio
import json
import logging
logging.basicConfig(format='%(asctime)s.%(msecs)03d %(message)s',
datefmt='%S')
root = logging.getLogger()
root.setLevel(logging.INFO)
async def readChunks():
# this is basically a dummy alternative for reading csv in chunks
df = [{"chunk_" + str(x) : [r for r in range(10)]} for x in range(10)]
for chunk in df:
await asyncio.sleep(0.002)
root.info('readChunks: next chunk coming')
yield chunk
async def send(row):
j = json.dumps(row)
root.info(f"to be sent: {j}")
await asyncio.sleep(0.002)
async def main():
i = 0
root.info('main: starting to read chunks')
async for chunk in readChunks():
for k, v in chunk.items():
root.info(f'main: sending an item')
#await asyncio.gather(send({k:v}))
stuff = await send({k:v})
i += 1
if i > 5:
break
#print(f"item in main via async generator is {chunk}")
##loop = asyncio.get_event_loop()
##loop.run_until_complete(main())
##loop.close()
if __name__ == '__main__':
asyncio.run(main())
...至少它可以运行并完成。
在bugs.python.org/issue38013中描述了通过退出async for
循环来停止异步生成器的问题,该问题似乎已在3.7.5中修复。
但是,使用
loop = asyncio.get_event_loop()
loop.set_debug(True)
loop.run_until_complete(main())
loop.close()
我收到调试错误,但在Python 3.8中没有异常。
Task was destroyed but it is pending!
task: <Task pending name='Task-8' coro=<<async_generator_athrow without __name__>()>>
使用高级API asyncio.run(main())
with debugging ON,我不得到调试消息。如果要尝试升级到Python 3.7.5-9,则可能仍应使用asyncio.run()
。
答案 2 :(得分:0)
问题很简单。您可以尽早退出循环,但是异步生成器尚未耗尽(它的挂起状态):
...
if i > 5:
break
...
答案 3 :(得分:0)
您的readChunks
正在异步运行,并且您的循环正在运行。并且没有完成程序就破坏了它。
这就是为什么它会给asyncio task was destroyed but it is pending
简而言之,异步任务是在后台执行其任务,但是您通过中断循环(停止程序)杀死了它。