将`host`添加到header [python requests]后超过30次重定向

时间:2015-08-03 11:33:42

标签: python python-2.7 http-headers python-requests

在标题中获取包含host的网址会引发异常Exceeded 30 redirects 这太奇怪了,我无法弄清楚 以下是测试代码:

>>> url = 'http://bbs.duchang8.com/forum-29-1.html'
>>> r = requests.get(url)
>>> print r.status_code
200
>>> headers = {
...     'Host': 'bbs.duchang8.com',
... }
>>> r = requests.get(url, headers=headers)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/www/article_fetcher/venv/local/lib/python2.7/site-packages/requests/api.py", line 69, in get
    return request('get', url, params=params, **kwargs)
  File "/data/www/article_fetcher/venv/local/lib/python2.7/site-packages/requests/api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "/data/www/article_fetcher/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 465, in request
    resp = self.send(prep, **send_kwargs)
  File "/data/www/article_fetcher/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 594, in send
    history = [resp for resp in gen] if allow_redirects else []
  File "/data/www/article_fetcher/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 114, in resolve_redirects
    raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects)
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.

1 个答案:

答案 0 :(得分:3)

简答:

不要覆盖Host:标题。

或者,使用重定向客户端的主机覆盖它。

答案很长

通过明确设置Host标头,您告诉requests所有后续请求中使用该标头,包括因重定向响应而重新发出的任何请求服务器

在这种情况下,requests客户端被重定向到由不同服务器托管的位置http://www.duchang8.com/forum-29-1.html; www.duchang8.combbs.duchang8.com。虽然两个主机名都解析为相同的IP地址,但远程HTTP服务器对它们的处理方式不同。

nett结果是requests继续使用您提供的Host:标头,而不是服务器返回的正确标头。然后,由于URL /服务器主机与Host:标头不匹配,将拒绝(通过重定向)对新位置的后续请求。

>>> import requests
>>> url = 'http://bbs.duchang8.com/forum-29-1.html'
>>> r = requests.get(url)
>>> r
<Response [200]>
>>> r.history
[<Response [301]>]
>>> r.history[0].headers
{'content-length': '178', 'server': 'nginx', 'connection': 'keep-alive', 'location': 'http://www.duchang8.com/forum-29-1.html', 'date': 'Mon, 03 Aug 2015 12:20:31 GMT', 'content-type': 'text/html'}

我们在此处看到客户端被HTTP 301响应和http://www.duchang8.com/forum-29-1.html标头重定向到location:

使用curl,您可以看到在获取新位置时尝试提供不同的Host:标头会发生什么:

$ curl -v -L -H 'Host: bbs.duchang8.com' http://www.duchang8.com/forum-29-1.html
*   Trying 61.160.249.39...
* Connected to www.duchang8.com (61.160.249.39) port 80 (#0)
> GET /forum-29-1.html HTTP/1.1
> User-Agent: curl/7.40.0
> Accept: */*
> Host: bbs.duchang8.com
> 
< HTTP/1.1 301 Moved Permanently
< Server: nginx
< Date: Mon, 03 Aug 2015 12:27:33 GMT
< Content-Type: text/html
< Content-Length: 178
< Connection: keep-alive
< Location: http://www.duchang8.com/forum-29-1.html
< 
* Ignoring the response-body
* Connection #0 to host www.duchang8.com left intact
* Issue another request to this URL: 'http://www.duchang8.com/forum-29-1.html'
* Found bundle for host www.duchang8.com: 0x21b54c0
* Re-using existing connection! (#0) with host www.duchang8.com
* Connected to www.duchang8.com (61.160.249.39) port 80 (#0)
> GET /forum-29-1.html HTTP/1.1
> User-Agent: curl/7.40.0
> Accept: */*
> Host: bbs.duchang8.com
> 
< HTTP/1.1 301 Moved Permanently
< Server: nginx
< Date: Mon, 03 Aug 2015 12:27:33 GMT
< Content-Type: text/html
< Content-Length: 178
< Connection: keep-alive
< Location: http://www.duchang8.com/forum-29-1.html
<
# and so so, and so on....

它以重定向循环结束。 requests发生了相同的请求和响应序列,最终决定永远不会结束并中止请求。