Python:urllib.error.HTTPError:HTTP错误525:原始SSL握手错误

时间:2018-02-11 11:10:56

标签: python ssl urllib

我使用Python 3在一个网站上使用 urllib.request.build_opener 抓取许多网页。每个 web_page_url 的打开方式如下:

_masterOpener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(CookieJar()))
_masterOpener.addheaders = [('Cookie', some_cookie)]
request = _masterOpener.open(web_page_url)
content = request.read()

在抓取前几百页大约10分钟(我试过几次)时,它总能顺利运行,然后出现如下错误:

File "D:\ProgramData\Anaconda3\lib\urllib\request.py", line 532, in open
response = meth(req, response)
File "D:\ProgramData\Anaconda3\lib\urllib\request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "D:\ProgramData\Anaconda3\lib\urllib\request.py", line 570, in error
return self._call_chain(*args)
File "D:\ProgramData\Anaconda3\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "D:\ProgramData\Anaconda3\lib\urllib\request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 525: Origin SSL Handshake Error

我在网上搜索并没有找到解决方案。如何解决问题'urllib.error.HTTPError:HTTP错误525:原始SSL握手错误'如上所述?

1 个答案:

答案 0 :(得分:2)

HTTP状态5xx错误表示服务器中存在错误,您有责任优雅地处理它们(例如,不要使爬虫崩溃)。

在这种情况下,错误525问题似乎是CloudFlare-specific, where connection to original site via CloudFlare has timed out

所以只需添加try ... except子句即可正常处理此错误:

try:
    _masterOpener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(CookieJar()))
    _masterOpener.addheaders = [('Cookie', some_cookie)]
    request = _masterOpener.open(web_page_url)
    content = request.read()
except urllib.error.HTTPError as e:
    # Possible issue with CloudFlare, just fall through
    if e.code == 525:
        # TODO: Log warning about broken url
        pass
    # TODO: ... handle all the other 5xx errors
    # Raise the original exception
    raise