Python请求的异步请求

时间:2012-02-02 10:20:08

标签: python asynchronous python-requests httprequest

我尝试了requests library for python文档中提供的示例。

使用async.map(rs),我会收到回复代码,但我想获取所请求的每个页面的内容。例如,这不起作用:

out = async.map(rs)
print out[0].content

14 个答案:

答案 0 :(得分:133)

注意

以下答案适用于请求v0.13.0 +。写完这个问题后,异步功能已移至grequests。但是,您可以将requests替换为下面的grequests,它应该有效。

我留下这个答案是为了反映原来的问题是关于使用请求< v0.13.0。


要使用async.map 异步执行多项任务,您必须:

  1. 为每个对象(您的任务)定义一个函数
  2. 在请求中将该函数添加为事件挂钩
  3. 在所有请求/操作列表中调用async.map
  4. 示例:

    from requests import async
    # If using requests > v0.13.0, use
    # from grequests import async
    
    urls = [
        'http://python-requests.org',
        'http://httpbin.org',
        'http://python-guide.org',
        'http://kennethreitz.com'
    ]
    
    # A simple task to do to each response object
    def do_something(response):
        print response.url
    
    # A list to hold our things to do via async
    async_list = []
    
    for u in urls:
        # The "hooks = {..." part is where you define what you want to do
        # 
        # Note the lack of parentheses following do_something, this is
        # because the response will be used as the first argument automatically
        action_item = async.get(u, hooks = {'response' : do_something})
    
        # Add the task to our list of things to do via async
        async_list.append(action_item)
    
    # Do our list of things to do via async
    async.map(async_list)
    

答案 1 :(得分:69)

async现在是一个独立的模块:grequests

见这里:https://github.com/kennethreitz/grequests

然后:Ideal method for sending multiple HTTP requests over Python?

安装:

$ pip install grequests

用法:

构建一个堆栈:

import grequests

urls = [
    'http://www.heroku.com',
    'http://tablib.org',
    'http://httpbin.org',
    'http://python-requests.org',
    'http://kennethreitz.com'
]

rs = (grequests.get(u) for u in urls)

发送堆栈

grequests.map(rs)

结果看起来像

[<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>]

grequests似乎没有为并发请求设置限制,即多个请求被发送到同一服务器时。

答案 2 :(得分:40)

我测试了 requests-futures grequests 。 Grequests更快,但带来了猴子修补和依赖项的其他问题。请求 - 期货比问候慢几倍。我决定编写自己的并简单地将请求包装到ThreadPollExecutor中,它几乎和grequest一样快,但没有外部依赖。

import requests
import concurrent.futures

def get_urls():
    return ["url1","url2"]

def load_url(url, timeout):
    return requests.get(url, timeout = timeout)

with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:

    future_to_url = {executor.submit(load_url, url, 10): url for url in     get_urls()}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            resp_err = resp_err + 1
        else:
            resp_ok = resp_ok + 1

答案 3 :(得分:26)

也许requests-futures是另一种选择。

from requests_futures.sessions import FuturesSession

session = FuturesSession()
# first request is started in background
future_one = session.get('http://httpbin.org/get')
# second requests is started immediately
future_two = session.get('http://httpbin.org/get?foo=bar')
# wait for the first request to complete, if it hasn't already
response_one = future_one.result()
print('response one status: {0}'.format(response_one.status_code))
print(response_one.content)
# wait for the second request to complete, if it hasn't already
response_two = future_two.result()
print('response two status: {0}'.format(response_two.status_code))
print(response_two.content)

还建议the office document。如果你不想参与gevent,那就很好了。

答案 4 :(得分:7)

我知道这已经关闭了一段时间,但我认为推广构建在请求库上的另一个异步解决方案可能会有用。

list_of_requests = ['http://moop.com', 'http://doop.com', ...]

from simple_requests import Requests
for response in Requests().swarm(list_of_requests):
    print response.content

文档在这里:http://pythonhosted.org/simple-requests/

答案 5 :(得分:5)

threads=list()

for requestURI in requests:
    t = Thread(target=self.openURL, args=(requestURI,))
    t.start()
    threads.append(t)

for thread in threads:
    thread.join()

...

def openURL(self, requestURI):
    o = urllib2.urlopen(requestURI, timeout = 600)
    o...

答案 6 :(得分:5)

您可以使用httpx

import httpx

async def get_async(url):
    async with httpx.AsyncClient() as client:
        return await client.get(url)

urls = ["http://google.com", "http://wikipedia.org"]

# Note that you need an async context to use `await`.
await asyncio.gather(*map(get_async, urls))

如果需要函数语法,则gamla库将其包装到get_async中。

那你就可以做


await gamla.map(gamla.get_async(10))(["http://google.com", "http://wikipedia.org"])

10是超时时间,以秒为单位。

(免责声明:我是它的作者)

答案 7 :(得分:5)

不幸的是,据我所知,请求库不具备执行异步请求的能力。您可以将async/await语法包装在requests周围,但这将使基础请求保持同步。如果要使用真正的异步请求,则必须使用提供该请求的其他工具。一种这样的解决方案是aiohttp(Python 3.5.3+)。以我在Python 3.7 async/await语法中使用它的经验来看,它效果很好。下面,我写了三种使用

执行n个Web请求的实现。
  1. 使用Python sync_requests_get_all库的完全同步请求(requests
  2. 使用Python 3.7 async_requests_get_all语法和requests包装的Python async/await库的同步请求(asyncio
  3. 真正的异步实现(async_aiohttp_get_all),其中包含Python 3.7 aiohttp语法和async/await包装的Python asyncio
import time
import asyncio
import requests
import aiohttp

from types import SimpleNamespace

durations = []


def timed(func):
    """
    records approximate durations of function calls
    """
    def wrapper(*args, **kwargs):
        start = time.time()
        print(f'{func.__name__:<30} started')
        result = func(*args, **kwargs)
        duration = f'{func.__name__:<30} finsished in {time.time() - start:.2f} seconds'
        print(duration)
        durations.append(duration)
        return result
    return wrapper


async def fetch(url, session):
    """
    asynchronous get request
    """
    async with session.get(url) as response:
        response_json = await response.json()
        return SimpleNamespace(**response_json)


async def fetch_many(loop, urls):
    """
    many asynchronous get requests, gathered
    """
    async with aiohttp.ClientSession() as session:
        tasks = [loop.create_task(fetch(url, session)) for url in urls]
        return await asyncio.gather(*tasks)


@timed
def asnyc_aiohttp_get_all(urls):
    """
    performs asynchronous get requests
    """
    loop = asyncio.get_event_loop()
    return loop.run_until_complete(fetch_many(loop, urls))


@timed
def sync_requests_get_all(urls):
    """
    performs synchronous get requests
    """
    # use session to reduce network overhead
    session = requests.Session()
    return [SimpleNamespace(**session.get(url).json()) for url in urls]


@timed
def async_requests_get_all(urls):
    """
    asynchronous wrapper around synchronous requests
    """
    loop = asyncio.get_event_loop()
    # use session to reduce network overhead
    session = requests.Session()

    async def async_get(url):
        return session.get(url)

    async_tasks = [loop.create_task(async_get(url)) for url in urls]
    return loop.run_until_complete(asyncio.gather(*async_tasks))


if __name__ == '__main__':
    # this endpoint takes ~3 seconds to respond,
    # so a purely synchronous implementation should take
    # little more than 30 seconds and a purely asynchronous
    # implementation should take little more than 3 seconds.
    urls = ['https://postman-echo.com/delay/3']*10

    sync_requests_get_all(urls)
    async_requests_get_all(urls)
    asnyc_aiohttp_get_all(urls)
    print('----------------------')
    [print(duration) for duration in durations]

在我的机器上,这是输出:

sync_requests_get_all          started
sync_requests_get_all          finsished in 30.92 seconds
async_requests_get_all         started
async_requests_get_all         finsished in 30.87 seconds
asnyc_aiohttp_get_all          started
asnyc_aiohttp_get_all          finsished in 3.22 seconds
----------------------
sync_requests_get_all          finsished in 30.92 seconds
async_requests_get_all         finsished in 30.87 seconds
asnyc_aiohttp_get_all          finsished in 3.22 seconds

答案 8 :(得分:2)

我一直在使用python请求对github的gist API进行异步调用。

有关示例,请参阅此处的代码:

https://github.com/davidthewatson/flasgist/blob/master/views.py#L60-72

这种python风格可能不是最明显的例子,但我可以向你保证代码可以运行。如果这让您感到困惑,请告诉我,我会将其记录下来。

答案 9 :(得分:2)

免责声明:Following code creates different threads for each function

这在某些情况下可能很有用,因为它更易于使用。但要知道它不是异步的,而是使用多线程产生异步的错觉,即使装饰者建议这样做。

您可以使用下面的装饰器在函数执行完成后给出回调,回调必须处理函数返回的数据。

请注意,函数被修饰后会返回一个 Future 对象。

import asyncio

## Decorator implementation of async runner !!
def run_async(callback, loop=None):
    if loop is None:
        loop = asyncio.get_event_loop()

    def inner(func):
        def wrapper(*args, **kwargs):
            def __exec():
                out = func(*args, **kwargs)
                callback(out)
                return out

            return loop.run_in_executor(None, __exec)

        return wrapper

    return inner

实现示例:

urls = ["https://google.com", "https://facebook.com", "https://apple.com", "https://netflix.com"]
loaded_urls = []  # OPTIONAL, used for showing realtime, which urls are loaded !!


def _callback(resp):
    print(resp.url)
    print(resp)
    loaded_urls.append((resp.url, resp))  # OPTIONAL, used for showing realtime, which urls are loaded !!


# Must provide a callback function, callback func will be executed after the func completes execution
# Callback function will accept the value returned by the function.
@run_async(_callback)
def get(url):
    return requests.get(url)


for url in urls:
    get(url)

如果你想查看实时加载的url,你也可以在最后添加以下代码:

while True:
    print(loaded_urls)
    if len(loaded_urls) == len(urls):
        break

答案 10 :(得分:1)

如果要使用asyncio,则requests-asyncrequests-https://github.com/encode/requests-async

提供异步/等待功能

答案 11 :(得分:1)

我将 suggestion above 用作 HTTPX,但我经常以不同的方式使用它,因此我添加了我的答案。

我个人使用 asyncio.run (introduced in Python 3.7) 而不是 asyncio.gather,也更喜欢 aiostream 方法,它可以与 asyncio 和 httpx 结合使用。

就像我刚刚发布的 this example 一样,这种样式有助于异步处理一组 URL,即使出现(常见)错误也是如此。我特别喜欢这种风格如何阐明响应处理发生的位置以及便于错误处理(我发现异步调用往往会提供更多)。

发布一个异步触发一堆请求的简单示例更容易,但通常您还想处理响应内容(用它计算一些东西,也许参考您请求的 URL 的原始对象是用)。

该方法的核心如下:

async with httpx.AsyncClient(timeout=timeout) as session:
    ws = stream.repeat(session)
    xs = stream.zip(ws, stream.iterate(urls))
    ys = stream.starmap(xs, fetch, ordered=False, task_limit=20)
    process = partial(process_thing, things=things, pbar=pbar, verbose=verbose)
    zs = stream.map(ys, process)
    return await zs

哪里:

  • process_thing 是一个异步响应内容处理函数
  • things 是输入列表(URL 字符串的 urls 生成器来自),例如对象/字典列表
  • pbar 是一个进度条(例如 tqdm.tqdm)[可选但有用]

所有这些都在一个异步函数 async_fetch_urlset 中,然后通过调用名为例如的同步“顶级”函数来运行该函数。 fetch_things 运行协程 [这是异步函数返回的内容] 并管理事件循环:

def fetch_things(urls, things, pbar=None, verbose=False):
    return asyncio.run(async_fetch_urlset(urls, things, pbar, verbose))

由于作为输入传递的列表(此处为 things)可以就地修改,因此您可以有效地返回输出(正如我们习惯于从同步函数调用中那样)

答案 12 :(得分:0)

我也尝试过在python中使用异步方法的一些东西,但是我怎么能有更好的运气使用twisted来进行异步编程。它的问题较少,并且有很好的文档记录。这是一个类似于你正在尝试扭曲的东西的链接。

http://pythonquirks.blogspot.com/2011/04/twisted-asynchronous-http-request.html

答案 13 :(得分:0)

支持如此多的第三方库,几乎所有它们都已弃用/指向其他地方,或者功能严重受限。

我发布的大多数答案都存在很多问题-它们要么使用已过时的库,而这些库已被移植并具有有限的功能,要么为解决方案的执行提供了太多魔力,因此很难出错处理。

某些解决方案完全可以在http请求中正常工作,但是对于任何其他类型的请求,这些解决方案都不够。

我相信,仅使用python内置库org.springframework.data.domain.Pageable就足以执行任何类型的异步请求,并为复杂的和用例特定的错误处理提供足够的流动性。

asyncio

它的工作原理很简单。您基本上是在创建一系列要异步执行的任务,然后要求循环执行这些任务并在完成时退出。没有多余的库,无需维护,也无需缺少功能。