场景:我需要从网络应用程序的API(每分钟通话限制为100个)中收集分页数据。我需要返回的API对象每页包含100个项目,总共105个页面,并且还在不断增长(约10,500个项目)。同步代码大约需要15分钟才能检索所有页面,因此不必担心达到调用限制。但是,我想加快数据检索的速度,因此我使用>>> t1 = "Hi my name is David"
>>> t2 = t1* 10
>>> t1Compressed = zlib.compress(t1.encode())
>>> t10Compressed = zlib.compress(t2.encode())
>>> type(t1Compressed)
<class 'bytes'>
>>> len(t1Compressed)
27
>>> len(t10Compressed)
30
和asyncio
实现了异步调用。现在可以在15秒内下载数据-很好。
问题:我现在达到了通话限制,因此在最近5次左右的通话中收到403个错误。
建议的解决方案,我实现了在aiohttp
函数中找到的try/except
。我打了电话,然后由于get_data()
而使呼叫失败时,我退后403: Exceeded call limit
秒,然后重试back_off
次:
retries
问题:如何限制aiohttp通话,使它们保持在每分钟100个通话的阈值以下,而不发出403请求回退?我已经尝试了以下模块,但它们似乎都不起作用:async def get_data(session, url):
retries = 3
back_off = 60 # seconds to try again
for _ in range(retries):
try:
async with session.get(url, headers=headers) as response:
if response.status != 200:
response.raise_for_status()
print(retries, response.status, url)
return await response.json()
except aiohttp.client_exceptions.ClientResponseError as e:
retries -= 1
await asyncio.sleep(back_off)
continue
async def main():
async with aiohttp.ClientSession() as session:
attendee_urls = get_urls('attendee') # returns list of URLs to call asynchronously in get_data()
attendee_data = await asyncio.gather(*[get_data(session, attendee_url) for attendee_url in attendee_urls])
return attendee_data
if __name__ == '__main__':
data = asyncio.run(main())
,ratelimiter
和ratelimit
。
目标:每分钟要进行100次异步呼叫,但是请先退出,然后在需要时重试(403:超出呼叫限制)。
答案 0 :(得分:1)
您可以通过在每个请求之前添加延迟来实现“最多 100 个请求/分钟”。
100 个请求/分钟相当于 1 个请求/0.6 秒。
async def main():
async with aiohttp.ClientSession() as session:
attendee_urls = get_urls('attendee') # returns list of URLs to call asynchronously in get_data()
coroutines = []
for attendee_url in attendee_urls:
coroutines.append(get_data(session, attendee_url))
await asyncio.sleep(0.6)
attendee_data = asyncio.gather(*coroutines)
return attendee_data
除了请求速率限制之外,API 通常还会限制数量。的同时请求。如果是这样,您可以使用 BoundedSempahore。
async def main():
sema = asyncio.BoundedSemaphore(50) # Assuming a concurrent requests limit of 50
...
coroutines.append(get_data(sema, session, attendee_url))
...
def get_data(sema, session, attendee_url):
...
for _ in range(retries):
try:
async with sema:
response = await session.get(url, headers=headers):
if response.status != 200:
response.raise_for_status()
...