我正在使用这个简单的代码
for l in bios:
OpenThisLink = url + l
response = urllib2.urlopen(OpenThisLink)
打开大约200个网址并用正则表达式(和BeautifulSoup)搜索它们,但是在十几个之后我得到这些错误并且IDLE退出。他们的意思是什么?我怎么处理它们?
谢谢。
Traceback (most recent call last):
File "\PROJECTS\JD\jd10.py", line 15, in <module> response = urllib2.urlopen(OpenThisLink)
File "C:\Python26\lib\urllib2.py", line 124, in urlopen return _opener.open(url, data, timeout)
File "C:\Python26\lib\urllib2.py", line 389, in open response = meth(req, response)
File "C:\Python26\lib\urllib2.py", line 502, in http_response 'http', request, response, code, msg, hdrs)
File "C:\Python26\lib\urllib2.py", line 421, in error result = self._call_chain(*args)
File "C:\Python26\lib\urllib2.py", line 361, in _call_chain result = func(*args)
File "C:\Python26\lib\urllib2.py", line 597, in http_error_302 return self.parent.open(new)
File "C:\Python26\lib\urllib2.py", line 389, in open response = meth(req, response)
File "C:\Python26\lib\urllib2.py", line 502, in http_response 'http', request, response, code, msg, hdrs)
File "C:\Python26\lib\urllib2.py", line 421, in error result = self._call_chain(*args)
File "C:\Python26\lib\urllib2.py", line 361, in _call_chain result = func(*args)
File "C:\Python26\lib\urllib2.py", line 597, in http_error_302 return self.parent.open(new)
File "C:\Python26\lib\urllib2.py", line 389, in open response = meth(req, response)
File "C:\Python26\lib\urllib2.py", line 502, in http_response 'http', request, response, code, msg, hdrs)
File "C:\Python26\lib\urllib2.py", line 427, in error return self._call_chain(*args)
File "C:\Python26\lib\urllib2.py", line 361, in _call_chain result = func(*args)
File "C:\Python26\lib\urllib2.py", line 510, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) HTTPError: HTTP Error 404: Not Found
答案 0 :(得分:3)
引发的错误是HTTPError
- 具体而言,您的某个网址会引发404。你可以忽略它:
for l in bios:
OpenThisLink = url + l
try:
response = urllib2.urlopen(OpenThisLink)
except urllib2.HTTPError:
pass
或者,您可以使用(稍微)更有意义的消息重新引发错误:
for l in bios:
OpenThisLink = url + l
try:
response = urllib2.urlopen(OpenThisLink)
except urllib2.HTTPError as e:
raise Exception('Error opening %s: %s' % (e.geturl(), e))
答案 1 :(得分:2)
我对你正在使用的特定库一无所知。但是,这对我来说就像一个大堆栈跟踪导致最终的原始错误:
HTTPError:HTTP错误404:未找到
我认为其中一个链接很糟糕,并触发了一个未被捕获的异常。
编辑:“坏”我的意思是服务器无法检索页面,因此出现404错误。