Question

我正在尝试使用aiohttp包编写一些异步GET请求，并且已经弄清了大部分内容，但是我想知道处理故障（作为异常返回）的标准方法是什么。

到目前为止，我对代码的总体了解（经过反复试验，我遵循的方法是here）：

import asyncio
import aiofiles
import aiohttp
from pathlib import Path

with open('urls.txt', 'r') as f:
    urls = [s.rstrip() for s in f.readlines()]

async def fetch(session, url):
    async with session.get(url) as response:
        if response.status != 200:
            response.raise_for_status()
        data = await response.text()
    # (Omitted: some more URL processing goes on here)
    out_path = Path(f'out/')
    if not out_path.is_dir():
        out_path.mkdir()
    fname = url.split("/")[-1]
    async with aiofiles.open(out_path / f'{fname}.html', 'w+') as f:
        await f.write(data)

async def fetch_all(urls, loop):
    async with aiohttp.ClientSession(loop=loop) as session:
        results = await asyncio.gather(*[fetch(session, url) for url in urls],
                return_exceptions=True)
        return results

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    results = loop.run_until_complete(fetch_all(urls, loop))

现在运行正常：

按预期，在results变量中填充了None项，其中相应的URL [即已成功请求urls数组变量中相同索引处的索引，即输入文件urls.txt]中相同行号的请求，并将相应的文件写入磁盘。
这意味着我可以使用结果变量来确定哪些URL不成功（results中的那些条目不等于None）

我看过一些使用各种异步Python软件包（aiohttp，aiofiles和asyncio）的指南，但是我还没有看到处理此问题的标准方法最后一步。

在await语句“完成” /“完成”之后，是否应该重新尝试发送GET请求？
...或者失败后应通过某种回调启动重试发送GET请求
- 错误看起来像这样：(ClientConnectorError(111, "Connect call failed ('000.XXX.XXX.XXX', 443)")，即对端口000.XXX.XXX.XXX的IP地址443的请求失败了，可能是因为服务器有一些限制，我应该等待一个限制来遵守重试之前超时。
我是否可以考虑施加某种限制来分批请求而不是全部尝试？
尝试列表中的数百个（超过500个）URL时，我收到约40-60个成功请求。

天真地，我期望run_until_complete以这样的方式处理该问题，使其在成功请求所有URL时完成，但这不是事实。

我以前没有使用异步Python和会话/循环，因此感谢您对如何获取results的帮助。请让我知道是否可以提供更多信息来改善这个问题，谢谢！

Answer 1

在await语句“完成” /“完成”之后，是否应该重新尝试发送GET请求？ ...或者应该在失败后通过某种回调启动重试发送GET请求

您可以做前者。您不需要任何特殊的回调，因为您是在协程内部执行的，因此简单的while循环就足够了，并且不会干扰其他协程的执行。例如：

async def fetch(session, url):
    data = None
    while data is None:
        try:
            async with session.get(url) as response:
                response.raise_for_status()
                data = await response.text()
        except aiohttp.ClientError:
            # sleep a little and try again
            await asyncio.sleep(1)
    # (Omitted: some more URL processing goes on here)
    out_path = Path(f'out/')
    if not out_path.is_dir():
        out_path.mkdir()
    fname = url.split("/")[-1]
    async with aiofiles.open(out_path / f'{fname}.html', 'w+') as f:
        await f.write(data)

天真的，我期望run_until_complete以这样的方式处理该问题，使其在成功请求所有URL时完成

“完整”一词的含义是协程完成（运行过程）的技术意义，这可以通过协程返回或引发异常来实现。

使用asyncio / aiohttp获取多个URL，然后重试失败

1 个答案: