我正在从api下载jsons并使用asyncio模块。我的问题的关键是,以下事件循环实现如下:
loop = asyncio.get_event_loop()
main_task = asyncio.ensure_future( klass.download_all() )
loop.run_until_complete( main_task )
和download_all()
实现了类的这个实例方法,它已经创建并可用的下载程序对象,因此调用每个相应的download
方法:
async def download_all(self):
""" Builds the coroutines, uses asyncio.wait, then sifts for those still pending, loops """
ret = []
async with aiohttp.ClientSession() as session:
pending = []
for downloader in self._downloaders:
pending.append( asyncio.ensure_future( downloader.download(session) ) )
while pending:
dne, pnding= await asyncio.wait(pending)
ret.extend( [d.result() for d in dne] )
# Get all the tasks, cannot use "pnding"
tasks = asyncio.Task.all_tasks()
pending = [tks for tks in tasks if not tks.done()]
# Exclude the one that we know hasn't ended yet (UGLY)
pending = [t for t in pending if not t._coro.__name__ == self.download_all.__name__]
return ret
为什么,在下载程序的download
方法中,当我选择await
而不是asyncio.ensure_future
语法时,它会更快地运行,这看起来更像正如我从日志中看到的那样,“异步”。
这是有效的,因为我设置检测所有仍处于待处理状态的任务,并且不让download_all
方法完成,并继续调用asyncio.wait
。
我认为await
关键字允许事件循环机制完成其工作并有效地共享资源?怎么这样做更快?它有什么问题吗?例如:
async def download(self, session):
async with session.request(self.method, self.url, params=self.params) as response:
response_json = await response.json()
# Not using await here, as I am "supposed" to
asyncio.ensure_future( self.write(response_json, self.path) )
return response_json
async def write(self, res_json, path):
# using aiofiles to write, but it doesn't (seem to?) support direct json
# so converting to raw text first
txt_contents = json.dumps(res_json, **self.json_dumps_kwargs);
async with aiofiles.open(path, 'w') as f:
await f.write(txt_contents)
通过完整的代码实现和真正的API,我能够在34秒内下载44个资源,但是当使用await时花了超过三分钟(实际上我已经放弃了这么长时间)。
答案 0 :(得分:0)
当你在await
循环的每次迭代中执行for
时,它将等待下载每次迭代。
另一方面当你执行ensure_future
时,它不会创建下载所有文件的任务,然后在第二次循环中等待所有文件。