asyncio任务已销毁,但尚待处理

时间:2020-07-01 19:38:50

标签: python asynchronous python-asyncio coroutine event-loop

我正在工作一个示例程序,该程序从数据源(csv或rdbms)中分块读取数据,进行一些转换并将其通过套接字发送到服务器。

但是,由于csv非常大,出于测试目的,我想在几个块之后打破读数。 不幸的是出了点问题,我不知道该怎么办以及如何解决。可能我必须取消一些,但现在确定在哪里以及如何进行。我收到以下错误:

Task was destroyed but it is pending!
task: <Task pending coro=<<async_generator_athrow without __name__>()>>

示例代码为:

import asyncio
import json

async def readChunks():
  # this is basically a dummy alternative for reading csv in chunks
  df = [{"chunk_" + str(x) : [r for r in range(10)]} for x in range(10)]
  for chunk in df:
    await asyncio.sleep(0.001)
    yield chunk

async def send(row):
    j = json.dumps(row)
    print(f"to be sent: {j}")
    await asyncio.sleep(0.001)


async def main():
    i = 0
    async for chunk in readChunks():
        for k, v in chunk.items():
            await asyncio.gather(send({k:v}))
        i += 1
        if i > 5:
            break
        #print(f"item in main via async generator is {chunk}")
    

loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()

4 个答案:

答案 0 :(得分:1)

许多async资源(例如生成器)需要借助事件循环来清理。当async for循环停止通过break迭代异步生成器时,该生成器仅由垃圾收集器清除。这意味着任务处于待处理状态(等待事件循环),但被销毁(被垃圾收集器破坏)。

最直接的解决方法是明确aclose生成器:

async def main():
    i = 0
    aiter = readChunks()      # name iterator in order to ...
    try:
        async for chunk in aiter:
            ...
            i += 1
            if i > 5:
                break
    finally:
        await aiter.aclose()  # ... clean it up when done

可以使用asyncstdlib简化这些模式(免责声明:我维护此库)。 asyncstdlib.islice允许在清洁关闭发生器之前取出固定数量的物品:

import asyncstdlib as a

async def main():
    async for chunk in a.islice(readChunks(), 5):
        ...

如果break条件是动态的,则scoping the iterator保证在任何情况下都可以清除:

import asyncstdlib as a

async def main():
    async with a.scoped_iter(readChunks()) as aiter:
        async for idx, chunk in a.enumerate(aiter):
            ...
            if idx >= 5:
                break

答案 1 :(得分:1)

这有效...

import asyncio
import json
import logging

logging.basicConfig(format='%(asctime)s.%(msecs)03d %(message)s',
                    datefmt='%S')
root = logging.getLogger()
root.setLevel(logging.INFO)

async def readChunks():
  # this is basically a dummy alternative for reading csv in chunks
  df = [{"chunk_" + str(x) : [r for r in range(10)]} for x in range(10)]
  for chunk in df:
    await asyncio.sleep(0.002)
    root.info('readChunks: next chunk coming')
    yield chunk

async def send(row):
    j = json.dumps(row)
    root.info(f"to be sent: {j}")
    await asyncio.sleep(0.002)


async def main():
    i = 0
    root.info('main: starting to read chunks')
    async for chunk in readChunks():
        for k, v in chunk.items():
            root.info(f'main: sending an item')
            #await asyncio.gather(send({k:v}))
            stuff = await send({k:v})
        i += 1
        if i > 5:
            break
        #print(f"item in main via async generator is {chunk}")

##loop = asyncio.get_event_loop()
##loop.run_until_complete(main())
##loop.close()

if __name__ == '__main__':

    asyncio.run(main())

...至少它可以运行并完成。


在bugs.python.org/issue38013中描述了通过退出async for循环来停止异步生成器的问题,该问题似乎已在3.7.5中修复。

但是,使用

loop = asyncio.get_event_loop()
loop.set_debug(True)
loop.run_until_complete(main())
loop.close()

我收到调试错误,但在Python 3.8中没有异常。

Task was destroyed but it is pending!
task: <Task pending name='Task-8' coro=<<async_generator_athrow without __name__>()>>

使用高级API asyncio.run(main()) with debugging ON,我得到调试消息。如果要尝试升级到Python 3.7.5-9,则可能仍应使用asyncio.run()

答案 2 :(得分:0)

问题很简单。您可以尽早退出循环,但是异步生成器尚未耗尽(它的挂起状态):

...
if i > 5:
    break
...

答案 3 :(得分:0)

您的readChunks正在异步运行,并且您的循环正在运行。并且没有完成程序就破坏了它。

这就是为什么它会给asyncio task was destroyed but it is pending

简而言之,异步任务是在后台执行其任务,但是您通过中断循环(停止程序)杀死了它。