我使用Python 3在一个网站上使用 urllib.request.build_opener 抓取许多网页。每个 web_page_url 的打开方式如下:
_masterOpener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(CookieJar()))
_masterOpener.addheaders = [('Cookie', some_cookie)]
request = _masterOpener.open(web_page_url)
content = request.read()
在抓取前几百页大约10分钟(我试过几次)时,它总能顺利运行,然后出现如下错误:
File "D:\ProgramData\Anaconda3\lib\urllib\request.py", line 532, in open
response = meth(req, response)
File "D:\ProgramData\Anaconda3\lib\urllib\request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "D:\ProgramData\Anaconda3\lib\urllib\request.py", line 570, in error
return self._call_chain(*args)
File "D:\ProgramData\Anaconda3\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "D:\ProgramData\Anaconda3\lib\urllib\request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 525: Origin SSL Handshake Error
我在网上搜索并没有找到解决方案。如何解决问题'urllib.error.HTTPError:HTTP错误525:原始SSL握手错误'如上所述?
答案 0 :(得分:2)
HTTP状态5xx错误表示服务器中存在错误,您有责任优雅地处理它们(例如,不要使爬虫崩溃)。
在这种情况下,错误525问题似乎是CloudFlare-specific, where connection to original site via CloudFlare has timed out。
所以只需添加try ... except子句即可正常处理此错误:
try:
_masterOpener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(CookieJar()))
_masterOpener.addheaders = [('Cookie', some_cookie)]
request = _masterOpener.open(web_page_url)
content = request.read()
except urllib.error.HTTPError as e:
# Possible issue with CloudFlare, just fall through
if e.code == 525:
# TODO: Log warning about broken url
pass
# TODO: ... handle all the other 5xx errors
# Raise the original exception
raise