Question

问题：我需要向服务器发送许多HTTP请求。我只能使用一个连接（不可协商的服务器限制）。服务器的响应时间加上网络延迟太高了 - 我落后了。

请求通常不会更改服务器状态，也不会依赖于先前请求的响应。所以我的想法是简单地将它们发送到彼此之上，将响应对象排入队列，并依赖于Content-Length：传入响应以将传入的回复提供给下一个等待的响应对象。换句话说：将请求传递给服务器。

这当然不是完全安全的（任何没有Content-Length的回复：意味着麻烦），但我不在乎 - 在这种情况下，我总是可以重试任何排队的请求。（安全的方法是在发送下一位之前等待标题。这对我来说可能有所帮助。无法预先测试。）

因此，理想情况下，我希望以下客户端代码（使用客户端延迟来模拟网络延迟）在三秒钟内运行。

现在有64000美元的问题：是否有一个Python库已经这样做了，还是我需要自己动手？我的代码使用gevent;如果需要，我可以使用Twisted，但Twisted的标准连接池不支持流水线请求。如果有必要，我也可以为某个C库编写一个包装器，但我更喜欢本机代码。

#!/usr/bin/python

import gevent.pool
from gevent import sleep
from time import time

from geventhttpclient import HTTPClient

url = 'http://local_server/100k_of_lorem_ipsum.txt'
http = HTTPClient.from_url(url, concurrency=1)

def get_it(http):
    print time(),"Queueing request"
    response = http.get(url)
    print time(),"Expect header data"
    # Do something with the header, just to make sure that it has arrived
    # (the greenlet should block until then)
    assert response.status_code == 200
    assert response["content-length"] > 0
    for h in response.items():
        pass

    print time(),"Wait before reading body data"
    # Now I can read the body. The library should send at
    # least one new HTTP request during this time.
    sleep(2)
    print time(),"Reading body data"
    while response.read(10000):
        pass
    print time(),"Processing my response"
    # The next request should definitely be transmitted NOW.
    sleep(1)
    print time(),"Done"

# Run parallel requests
pool = gevent.pool.Pool(3)
for i in range(3):
    pool.spawn(get_it, http)

pool.join()
http.close()

Answer 1

Dugong是一个仅支持HTTP / 1.1的客户端，声称支持真正的HTTP / 1.1流水线操作。 The tutorial包含一些有关如何使用它的示例，包括一个using threads和另一个using asyncio。

请务必确认您正在与之通信的服务器实际上支持HTTP / 1.1流水线操作 - 某些服务器声称支持HTTP / 1.1但不实施流水线操作。

Answer 2

我认为txrequests可以帮助您获得所需的大部分内容，使用background_callback在单独的线程上对响应进行队列处理。每个请求仍然是它自己的线程，但默认情况下使用会话意味着它将重用相同的连接。

https://github.com/tardyp/txrequests#working-in-the-background

Answer 3

好像你正在运行python2。

对于python3＆gt; = 3.5 你可以使用async / await循环见asyncio

此外，还有一个内置的库，可以更好，更方便地使用名为Trio，可在点子上找到。

我能想到的另一件事是带锁的多线程。我会考虑如何更好地解释这一点，或者甚至可以工作。

带有请求流水线的Python HTTP客户端

3 个答案: