我正在尝试抓取一个无法验证的证书的网站。我收到以下错误:
ERROR: Certificate did not match expected hostname
刮刀曾经工作但我猜服务器上有变化。我查看了文档并发现:
https://doc.scrapy.org/en/latest/topics/settings.html#downloader-clientcontextfactory
它声明“Scrapy默认上下文工厂不执行远程服务器证书验证。这通常适用于网络抓取。”
在我的设置中,我没有上下文工厂,因此它验证证书的事实很奇怪 - 或者我可能需要设置上下文工厂?
无论如何,如果有人能指出我的解决方案,我们将不胜感激。
Scrapy版本信息:
Scrapy : 1.1.0
lxml : 3.6.0.0
libxml2 : 2.9.4
Twisted : 16.3.0
Python : 2.7.10
pyOpenSSL : 0.13.1 (OpenSSL 0.9.8zh 14 Jan 2016)
Platform : Darwin-16.0.0-x86_64-i386-64bit
完整堆栈跟踪:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 102, in iter_errback
yield next(it)
File "/usr/local/lib/python2.7/dist-packages/sh_scrapy/extension.py", line 65, in process_spider_output
for x in result:
File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/offsite.py", line 28, in process_spider_output
for x in result:
File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/referer.py", line 22, in <genexpr>
return (_set_referer(r) for r in result or ())
File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/depth.py", line 54, in <genexpr>
return (r for r in result or () if _filter(r))
File "/app/__main__.egg/meh/spiders/meh_spider.py", line 411, in parse_list
rloc = requests.get(location_url)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 65, in get
return request('get', url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 49, in request
response = session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 461, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 599, in send
history = [resp for resp in gen] if allow_redirects else []
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 192, in resolve_redirects
allow_redirects=False,
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 573, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 431, in send
raise SSLError(e, request=request)
SSLError: hostname 'items.xaha.ca' doesn't match either of '*.xah.ca', 'xah.ca'