Question

场景：我需要从网络应用程序的API（每分钟通话限制为100个）中收集分页数据。我需要返回的API对象每页包含100个项目，总共105个页面，并且还在不断增长（约10,500个项目）。同步代码大约需要15分钟才能检索所有页面，因此不必担心达到调用限制。但是，我想加快数据检索的速度，因此我使用>>> t1 = "Hi my name is David" >>> t2 = t1* 10 >>> t1Compressed = zlib.compress(t1.encode()) >>> t10Compressed = zlib.compress(t2.encode()) >>> type(t1Compressed) <class 'bytes'> >>> len(t1Compressed) 27 >>> len(t10Compressed) 30和asyncio实现了异步调用。现在可以在15秒内下载数据-很好。

问题：我现在达到了通话限制，因此在最近5次左右的通话中收到403个错误。

建议的解决方案，我实现了在aiohttp函数中找到的try/except。我打了电话，然后由于get_data()而使呼叫失败时，我退后403: Exceeded call limit秒，然后重试back_off次：

retries

问题：如何限制aiohttp通话，使它们保持在每分钟100个通话的阈值以下，而不发出403请求回退？我已经尝试了以下模块，但它们似乎都不起作用：async def get_data(session, url): retries = 3 back_off = 60 # seconds to try again for _ in range(retries): try: async with session.get(url, headers=headers) as response: if response.status != 200: response.raise_for_status() print(retries, response.status, url) return await response.json() except aiohttp.client_exceptions.ClientResponseError as e: retries -= 1 await asyncio.sleep(back_off) continue async def main(): async with aiohttp.ClientSession() as session: attendee_urls = get_urls('attendee') # returns list of URLs to call asynchronously in get_data() attendee_data = await asyncio.gather(*[get_data(session, attendee_url) for attendee_url in attendee_urls]) return attendee_data if __name__ == '__main__': data = asyncio.run(main())，ratelimiter和ratelimit。

目标：每分钟要进行100次异步呼叫，但是请先退出，然后在需要时重试（403：超出呼叫限制）。

Answer 1

您可以通过在每个请求之前添加延迟来实现“最多 100 个请求/分钟”。

100 个请求/分钟相当于 1 个请求/0.6 秒。

async def main():

    async with aiohttp.ClientSession() as session:
        attendee_urls = get_urls('attendee') # returns list of URLs to call asynchronously in get_data()
        coroutines = []
        for attendee_url in attendee_urls:
            coroutines.append(get_data(session, attendee_url))
            await asyncio.sleep(0.6)
        attendee_data = asyncio.gather(*coroutines)
        return attendee_data

除了请求速率限制之外，API 通常还会限制数量。的同时请求。如果是这样，您可以使用 BoundedSempahore。

async def main():
    sema = asyncio.BoundedSemaphore(50) # Assuming a concurrent requests limit of 50
...
            coroutines.append(get_data(sema, session, attendee_url))
...

def get_data(sema, session, attendee_url):

...

    for _ in range(retries):
        try:
            async with sema:
                response = await session.get(url, headers=headers):
                if response.status != 200:
                    response.raise_for_status()
...

异步请求退避/限制最佳实践

1 个答案: