我正在使用python的urllib2库向特定主机发出几个http请求。每次发出请求时,都会创建一个新的tcp和http连接,这需要花费大量时间。有没有办法使用urllib2保持tcp / http连接活着?
答案 0 :(得分:27)
如果切换到httplib,您将可以更好地控制基础连接。
例如:
import httplib
conn = httplib.HTTPConnection(url)
conn.request('GET', '/foo')
r1 = conn.getresponse()
r1.read()
conn.request('GET', '/bar')
r2 = conn.getresponse()
r2.read()
conn.close()
这将在相同的底层TCP连接上发送2个HTTP GET。
答案 1 :(得分:2)
我过去使用过第三方urllib3
库效果很好。它旨在通过汇集连接以便重用来补充urllib2
。
来自the wiki的修改示例:
>>> from urllib3 import HTTPConnectionPool
>>> # Create a connection pool for a specific host
... http_pool = HTTPConnectionPool('www.google.com')
>>> # simple GET request, for example
... r = http_pool.urlopen('GET', '/')
>>> print r.status, len(r.data)
200 28050
>>> r = http_pool.urlopen('GET', '/search?q=hello+world')
>>> print r.status, len(r.data)
200 79124
答案 2 :(得分:0)
如果你需要比普通的httplib更自动的东西,这可能有所帮助,虽然它不是线程安全的。
try:
from http.client import HTTPConnection, HTTPSConnection
except ImportError:
from httplib import HTTPConnection, HTTPSConnection
import select
connections = {}
def request(method, url, body=None, headers={}, **kwargs):
scheme, _, host, path = url.split('/', 3)
h = connections.get((scheme, host))
if h and select.select([h.sock], [], [], 0)[0]:
h.close()
h = None
if not h:
Connection = HTTPConnection if scheme == 'http:' else HTTPSConnection
h = connections[(scheme, host)] = Connection(host, **kwargs)
h.request(method, '/' + path, body, headers)
return h.getresponse()
def urlopen(url, data=None, *args, **kwargs):
resp = request('POST' if data else 'GET', url, data, *args, **kwargs)
assert resp.status < 400, (resp.status, resp.reason, resp.read())
return resp