我有一个我正在测试的简单网站。它在localhost上运行,我可以在我的Web浏览器中访问它。索引页面只是“运行”一词。 urllib.urlopen
会成功阅读该页面,但urllib2.urlopen
不会。这是一个演示问题的脚本(这是实际的脚本而不是简化不同的测试脚本):
import urllib, urllib2
print urllib.urlopen("http://127.0.0.1").read() # prints "running"
print urllib2.urlopen("http://127.0.0.1").read() # throws an exception
这是堆栈跟踪:
Traceback (most recent call last):
File "urltest.py", line 5, in <module>
print urllib2.urlopen("http://127.0.0.1").read()
File "C:\Python25\lib\urllib2.py", line 121, in urlopen
return _opener.open(url, data)
File "C:\Python25\lib\urllib2.py", line 380, in open
response = meth(req, response)
File "C:\Python25\lib\urllib2.py", line 491, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python25\lib\urllib2.py", line 412, in error
result = self._call_chain(*args)
File "C:\Python25\lib\urllib2.py", line 353, in _call_chain
result = func(*args)
File "C:\Python25\lib\urllib2.py", line 575, in http_error_302
return self.parent.open(new)
File "C:\Python25\lib\urllib2.py", line 380, in open
response = meth(req, response)
File "C:\Python25\lib\urllib2.py", line 491, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python25\lib\urllib2.py", line 418, in error
return self._call_chain(*args)
File "C:\Python25\lib\urllib2.py", line 353, in _call_chain
result = func(*args)
File "C:\Python25\lib\urllib2.py", line 499, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 504: Gateway Timeout
有什么想法吗?我可能最终需要urllib2
的一些更高级的功能,所以我不想只使用urllib
,而且我想了解这个问题。
答案 0 :(得分:16)
听起来你已经定义了urllib2正在接收的代理设置。当它尝试代理“127.0.0.01/”时,代理放弃并返回504错误。
来自Obscure python urllib2 proxy gotcha:
proxy_support = urllib2.ProxyHandler({})
opener = urllib2.build_opener(proxy_support)
print opener.open("http://127.0.0.1").read()
# Optional - makes this opener default for urlopen etc.
urllib2.install_opener(opener)
print urllib2.urlopen("http://127.0.0.1").read()
答案 1 :(得分:1)
首先调用urlib2.open后跟urllib.open会有相同的结果吗?只是想知道第一次打开是否导致http服务器忙于导致超时?
答案 2 :(得分:1)
我不知道发生了什么,但你可能会发现这有助于搞清楚:
>>> import urllib2
>>> urllib2.urlopen('http://mit.edu').read()[:10]
'<!DOCTYPE '
>>> urllib2._opener.handlers[1].set_http_debuglevel(100)
>>> urllib2.urlopen('http://mit.edu').read()[:10]
connect: (mit.edu, 80)
send: 'GET / HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: mit.edu\r\nConnection: close\r\nUser-Agent: Python-urllib/2.5\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Tue, 14 Oct 2008 15:52:03 GMT
header: Server: MIT Web Server Apache/1.3.26 Mark/1.5 (Unix) mod_ssl/2.8.9 OpenSSL/0.9.7c
header: Last-Modified: Tue, 14 Oct 2008 04:02:15 GMT
header: ETag: "71d3f96-2895-48f419c7"
header: Accept-Ranges: bytes
header: Content-Length: 10389
header: Connection: close
header: Content-Type: text/html
'<!DOCTYPE '
答案 3 :(得分:1)
urllib.urlopen()在服务器上抛出以下请求:
GET / HTTP/1.0
Host: 127.0.0.1
User-Agent: Python-urllib/1.17
而urllib2.urlopen()抛出这个:
GET / HTTP/1.1
Accept-Encoding: identity
Host: 127.0.0.1
Connection: close
User-Agent: Python-urllib/2.5
因此,您的服务器要么不了解HTTP / 1.1或额外的标头字段。