异步请求退避/限制最佳实践

时间:2020-02-12 15:48:08

标签: python rest asynchronous python-asyncio aiohttp

场景:我需要从网络应用程序的API(每分钟通话限制为100个)中收集分页数据。我需要返回的API对象每页包含100个项目,总共105个页面,并且还在不断增长(约10,500个项目)。同步代码大约需要15分钟才能检索所有页面,因此不必担心达到调用限制。但是,我想加快数据检索的速度,因此我使用>>> t1 = "Hi my name is David" >>> t2 = t1* 10 >>> t1Compressed = zlib.compress(t1.encode()) >>> t10Compressed = zlib.compress(t2.encode()) >>> type(t1Compressed) <class 'bytes'> >>> len(t1Compressed) 27 >>> len(t10Compressed) 30 asyncio实现了异步调用。现在可以在15秒内下载数据-很好。

问题:我现在达到了通话限制,因此在最近5次左右的通话中收到403个错误。

建议的解决方案,我实现了在aiohttp函数中找到的try/except。我打了电话,然后由于get_data()而使呼叫失败时,我退后403: Exceeded call limit秒,然后重试back_off次:

retries

问题:如何限制aiohttp通话,使它们保持在每分钟100个通话的阈值以下,而不发出403请求回退?我已经尝试了以下模块,但它们似乎都不起作用:async def get_data(session, url): retries = 3 back_off = 60 # seconds to try again for _ in range(retries): try: async with session.get(url, headers=headers) as response: if response.status != 200: response.raise_for_status() print(retries, response.status, url) return await response.json() except aiohttp.client_exceptions.ClientResponseError as e: retries -= 1 await asyncio.sleep(back_off) continue async def main(): async with aiohttp.ClientSession() as session: attendee_urls = get_urls('attendee') # returns list of URLs to call asynchronously in get_data() attendee_data = await asyncio.gather(*[get_data(session, attendee_url) for attendee_url in attendee_urls]) return attendee_data if __name__ == '__main__': data = asyncio.run(main()) ratelimiterratelimit

目标:每分钟要进行100次异步呼叫,但是请先退出,然后在需要时重试(403:超出呼叫限制)。

1 个答案:

答案 0 :(得分:1)

您可以通过在每个请求之前添加延迟来实现“最多 100 个请求/分钟”。

100 个请求/分钟相当于 1 个请求/0.6 秒。

async def main():

    async with aiohttp.ClientSession() as session:
        attendee_urls = get_urls('attendee') # returns list of URLs to call asynchronously in get_data()
        coroutines = []
        for attendee_url in attendee_urls:
            coroutines.append(get_data(session, attendee_url))
            await asyncio.sleep(0.6)
        attendee_data = asyncio.gather(*coroutines)
        return attendee_data

除了请求速率限制之外,API 通常还会限制数量。的同时请求。如果是这样,您可以使用 BoundedSempahore

async def main():
    sema = asyncio.BoundedSemaphore(50) # Assuming a concurrent requests limit of 50
...
            coroutines.append(get_data(sema, session, attendee_url))
...

def get_data(sema, session, attendee_url):

...

    for _ in range(retries):
        try:
            async with sema:
                response = await session.get(url, headers=headers):
                if response.status != 200:
                    response.raise_for_status()
...