Python urlopen错误404目录

时间:2013-10-21 18:49:01

标签: php python-3.x beautifulsoup urlopen

我有这段代码:

from urllib.request import urlopen
from bs4 import BeautifulSoup

page = urlopen("http://www.doctoralia.com")
soup = BeautifulSoup(page)
myfile = open('data.txt','w')
myfile.write(soup.prettify())
myfile.close()
print('done boy !')

效果很好! 但是当我将urlopen("http://www.doctoralia.com")更改为urlopen("http://www.doctoralia.com/healthpros")时,它会向我抛出此错误:

Traceback (most recent call last):
File "test.py", line 4, in <module>
page = urlopen("http://www.doctoralia.com/healthpros")
File "C:\Python33\lib\urllib\request.py", line 156, in urlopen
return opener.open(url, data, timeout)
File "C:\Python33\lib\urllib\request.py", line 475, in open
response = meth(req, response)
File "C:\Python33\lib\urllib\request.py", line 587, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python33\lib\urllib\request.py", line 513, in error
return self._call_chain(*args)
File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
result = func(*args)
File "C:\Python33\lib\urllib\request.py", line 595, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)

有什么问题? 感谢

1 个答案:

答案 0 :(得分:1)

如果您仍想查看实际代码,则必须处理此HTTPError。例如:

from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup

try:
    page = urlopen("http://www.doctoralia.com/healthpros")
except HTTPError as e:
    if e.code == 404:
        soup = BeautifulSoup(e.fp.read())
        print(soup.prettify())

如果页面已给出404 HTTPError,则会输出代码。

您可以删除if语句并为每个HTTPError执行此操作。