Question

有一个名为《激战2》的游戏，它为我们提供了查询游戏数据库中几乎所有内容的API。我的目标是使用python asyncio和aiohttp编写一个简单的搜寻器，并从《激战2》游戏数据库中获取所有物品的信息。

我写了一个简短的程序，可以用，但是它的行为有点怪异，我猜想这对组成协程来说有些不了解。

首先，我用Postman应用程序发出了一个请求。而且，响应标头中的X-Rate-Limit-Limit为600。所以我想请求限制为每分钟600个？

这是我的问题。

1，程序完成后。我检查了一些JSON文件，它们具有相同的内容

[{"name": "Endless Fractal Challenge Mote Tonic", "description": "Transform into a Challenge Mote for 15 minutes or until hit. You cannot move while transformed."......

这意味着请求得到了不好的答复，但我不知道为什么。

2，我尝试了asyncio.Semaphore，但是即使将并发限制为5，请求很快也超过了600。因此，我尝试通过在request_item函数末尾添加time.sleep（0.2）来控制时间。我猜想time.sleep（0.2）会将整个python进程挂起0.2秒，并且实际上起了作用，但是在执行了一段时间后，程序挂了很长时间，然后进行了很多失败的尝试。每次自动重试仍然失败。我对此行为感到困惑。

async def request_item(session, item_id):
    req_param_item = req_param
    req_param_item['ids'] = item_id
    # retry for 3 times when exception occurs.
    for i in range(3):
        try:
            async with session.get(url_template, params=req_param_item) as response:
                result = await response.json()
                with open(f'item_info/{item_id}.json', 'w') as f:
                    json.dump(result, f)
                print(item_id, 'done')
            break
        except Exception as e:
            print(item_id, i, 'failed')
            continue
    time.sleep(0.2)

当我将time.sleep（0.2）移到request_item函数内的for循环中时，整个程序挂起。我不知道发生了什么事。

async def request_item(session, item_id):
    req_param_item = req_param
    req_param_item['ids'] = item_id
    for i in range(3):
        try:
            time.sleep(0.2)
            async with session.get(url_template, params=req_param_item) as response:
                result = await response.json()
                with open(f'item_info/{item_id}.json', 'w') as f:
                    json.dump(result, f)
                print(item_id, 'done')
            break
        except Exception as e:
            print(item_id, i, 'failed')
            continue

有人可以解释一下吗？有更好的解决方案吗？我以为有一些解决方案，但我无法测试。例如，获取loop.time（），并为每600个请求暂停整个事件循环。或者，将600个请求添加到task_list并将其作为一个组进行收集，完成后，再次使用asyncio.run（get_item（req_ids））和另外600个请求。

这是我所有的代码。

import aiohttp
import asyncio
import httpx
import json
import math
import os
import time

tk = 'xxxxxxxx'
url_template = 'https://api.guildwars2.com/v2/items'

# get items list
req_param = {'access_token': tk}
item_list_resp = httpx.get(url_template, params=req_param)
items = item_list_resp.json()

async def request_item(session, item_id):
    req_param_item = req_param
    req_param_item['ids'] = item_id
    for i in range(3):
        try:
            async with session.get(url_template, params=req_param_item) as response:
                result = await response.json()
                with open(f'item_info/{item_id}.json', 'w') as f:
                    json.dump(result, f)
                print(item_id, 'done')
            break
        except Exception as e:
            print(item_id, i, 'failed')
            continue
    # since the game API limit requests, I think it's ok to suspend program for a while
    time.sleep(0.2)

async def get_item(item_ids: list):
    task_list = []
    async with aiohttp.ClientSession() as session:
        for item_id in item_ids:
            req = request_item(session, item_id)
            task = asyncio.create_task(req)
            task_list.append(task) 
        await asyncio.gather(*task_list)

asyncio.run(get_item(req_ids))

Answer 1

您使用的是time.sleep()而不是await asyncio.sleep()。堵孔执行N秒钟，然后在错误的位置进行。

这是发生了什么。当您运行

for item_id in item_ids:
   req = request_item(session, item_id)
   task = asyncio.create_task(req)
   task_list.append(task)

您只计划您的请求，但不运行它。（例如，您有1000个item_ids）因此您计划了1000个任务，并且在运行await asyncio.gather(*task_list)时，您实际上等待所有这1000个任务的执行。他们会立即开火。

但是在每个任务中，您需要运行time.sleep(0.2)，并且必须等待1000 * 0.2秒。请记住，所有任务通常以随机顺序一次运行。因此，您运行任务1，等待0.2秒，然后执行任务2，等待0.2秒，然后执行任务999，等待0.2秒，依此类推。

最简单的解决方案是在触发600个请求后等待一分钟。您需要放慢get_item的速度。示例代码（我不对其进行测试）：

async def get_item(item_ids: list):
    task_list = []
    async with aiohttp.ClientSession() as session:
        for n, item_id in enumerate(item_ids):
            req = request_item(session, item_id)
            task = asyncio.create_task(req)
            task_list.append(task)
            if n % 600 == 0:
                await asyncio.gather(*task_list)
                await asyncio.sleep(60)
                task_list = []

我建议您使用库asyncio-throttle。

PS。对于每分钟600个速率的限制，我认为您不需要asyncio，因为我很确定在5到10秒内将执行600个并发请求。两次检查是您的600请求与经典requests带线程的处理所花费的时间超过1分钟。

限制使用python aiohttp每分钟的并发和控制请求？

1 个答案: