urlopen仅适用于Python3中的某些URL

时间:2015-08-09 22:16:36

标签: python python-3.x urllib urlopen

所以我试图在python3中获取页面的URL ...

如果我执行以下操作,

from urllib.request import urlopen
html = urlopen("http://google.com/")
html.read()

我根据需要获得了HTML。 但是,如果我要选择不同的URL,如下所示,

from urllib.request import urlopen
html = urlopen("http://www.stackoverflow.com/")
html.read() 

第二行行后出现以下错误:

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 153, in urlopen return opener.open(url, data, timeout) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 461, in open response = meth(req, response) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 574, in http_response 'http', request, response, code, msg, hdrs) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 499, in error return self._call_chain(*args) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 433, in _call_chain result = func(*args) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 582, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden

为什么会发生这种情况以及如何解决这个问题?

1 个答案:

答案 0 :(得分:2)

如果您仔细查看错误消息,您会发现它是一个HTTP错误和一个特殊错误:

HTTP Error 403: Forbidden

所以你和服务器谈过并得到你的回复,但你不知道为什么你被拒绝了。

您可以在服务器返回的HTML中获得更详细的消息,如下所示:

from urllib.request import urlopen
from urllib.error import HTTPError

try:
    html = urlopen("http://www.stackoverflow.com/")
except HTTPError as e:
    print(e.read().decode('utf-8'))

html.read()

对我而言,它说:

<h2 data-translate="what_happened">What happened?</h2>
<p>The owner of this website (www.stackoverflow.com) has banned your access based on your browser's signature (213702c58d2116a6-ua48).</p>

您可以将HTTPError视为文件对象(https://docs.python.org/3/library/urllib.error.html#urllib.error.HTTPError):

  

虽然是异常(URLError的子类),但HTTPError可以   也可以作为一个非特殊的文件类返回值(相同   urlopen()返回的东西)。处理异国情调时这很有用   HTTP错误,例如身份验证请求。