使Python异步请求更快

时间:2019-06-03 08:01:09

标签: python python-requests python-asyncio aiohttp

我正在编写一个get方法,该方法获取ID数组,然后为每个ID发出请求。 id的数组可能为500+,而现在的请求将花费20+分钟。我尝试了几种不同的异步方法,例如aiohttp和async,但它们都无法使请求更快。这是我的代码:

async def get(self):
    self.set_header("Access-Control-Allow-Origin", "*")
    story_list = []
    duplicates = []
    loop = asyncio.get_event_loop()
    ids = loop.run_in_executor(None, requests.get, 'https://hacker-news.firebaseio.com/v0/newstories.json?print=pretty')
    response = await ids
    response_data = response.json()
    print(response.text)
    for url in response_data:
        if url not in duplicates:
            duplicates.append(url)
            stories = loop.run_in_executor(None, requests.get, "https://hacker-news.firebaseio.com/v0/item/{}.json?print=pretty".format(
            url))
            data = await stories
            if data.status_code == 200 and len(data.text) > 5:
                print(data.status_code)
                print(data.text)
                story_list.append(data.json())

有没有一种方法可以使用多线程来使请求更快?

1 个答案:

答案 0 :(得分:3)

这里的主要问题是代码并非真正异步。

获取URL列表后,您一次要获取一个,然后等待响应。

一个更好的主意是在执行程序中将所有URL的 all 排队并等待所有URL完成之前,先过滤掉重复项(使用set),例如:

async def get(self):
    self.set_header("Access-Control-Allow-Origin", "*")
    stories = []
    loop = asyncio.get_event_loop()
    # Single executor to share resources
    executor = ThreadPoolExecutor()

    # Get the initial set of ids
    response = await loop.run_in_executor(executor, requests.get, 'https://hacker-news.firebaseio.com/v0/newstories.json?print=pretty')
    response_data = response.json()
    print(response.text)

    # Putting them in a set will remove duplicates
    urls = set(response_data)

    # Build the set of futures (returned by run_in_executor) and wait for them all to complete
    responses = await asyncio.gather(*[
        loop.run_in_executor(
            executor, requests.get, 
            "https://hacker-news.firebaseio.com/v0/item/{}.json?print=pretty".format(url)
        ) for url in urls
    ])

    # Process the responses
    for response in responses:
        if response.status_code == 200 and len(response.text) > 5:
            print(response.status_code)
            print(response.text)
            stories.append(response.json())

    return stories