Scrapy忽略证书验证

时间:2016-10-06 00:35:31

标签: python scrapy python-requests

我正在尝试抓取一个无法验证的证书的网站。我收到以下错误:

ERROR: Certificate did not match expected hostname

刮刀曾经工作但我猜服务器上有变化。我查看了文档并发现:

https://doc.scrapy.org/en/latest/topics/settings.html#downloader-clientcontextfactory

它声明“Scrapy默认上下文工厂不执行远程服务器证书验证。这通常适用于网络抓取。”

在我的设置中,我没有上下文工厂,因此它验证证书的事实很奇怪 - 或者我可能需要设置上下文工厂?

无论如何,如果有人能指出我的解决方案,我们将不胜感激。

Scrapy版本信息:

Scrapy    : 1.1.0
lxml      : 3.6.0.0
libxml2   : 2.9.4
Twisted   : 16.3.0
Python    : 2.7.10
pyOpenSSL : 0.13.1 (OpenSSL 0.9.8zh 14 Jan 2016)
Platform  : Darwin-16.0.0-x86_64-i386-64bit

完整堆栈跟踪:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 102, in iter_errback
yield next(it)
  File "/usr/local/lib/python2.7/dist-packages/sh_scrapy/extension.py", line 65, in process_spider_output
for x in result:
  File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/offsite.py", line 28, in process_spider_output
for x in result:
  File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/referer.py", line 22, in <genexpr>
return (_set_referer(r) for r in result or ())
  File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
  File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/depth.py", line 54, in <genexpr>
return (r for r in result or () if _filter(r))
  File "/app/__main__.egg/meh/spiders/meh_spider.py", line 411, in parse_list
rloc = requests.get(location_url)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 65, in get
return request('get', url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 49, in request
response = session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 461, in request
resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 599, in send
history = [resp for resp in gen] if allow_redirects else []
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 192, in resolve_redirects
allow_redirects=False,
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 573, in send
r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 431, in send
raise SSLError(e, request=request)
SSLError: hostname 'items.xaha.ca' doesn't match either of '*.xah.ca', 'xah.ca'

0 个答案:

没有答案
相关问题