对于该站点:https://www.cnbanbao.cn/
我在MAC上尝试了此命令
openssl s_client -connect www.cnbanbao.cn:443 -msg
结果:
>>> TLS 1.2 Handshake [length 00c3], ClientHello
...
<<< TLS 1.2 Handshake [length 0051], ServerHello
...
<<< TLS 1.0 Handshake [length 0a4a], Certificate
...
<<< TLS 1.0 Handshake [length 0004], ServerHelloDone
...
>>> TLS 1.0 Handshake [length 0106], ClientKeyExchange
...
>>> TLS 1.0 ChangeCipherSpec [length 0001]
...
>>> TLS 1.0 Handshake [length 0010], Finished
我认为问题可能在于该站点将 TLS1.2 用于 ServerHello ,但将 TLS1.0 用于TLS 握手< / strong>,当我尝试以 scrapy
的方式下载网站时,会导致出现问题scrapy shell 'https://www.cnbanbao.cn/'
2019-01-24 11:49:57 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.cnbanbao.cn/> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2019-01-24 11:49:58 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.cnbanbao.cn/> (failed 2 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2019-01-24 11:49:59 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://www.cnbanbao.cn/> (failed 3 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
我尝试指定TLS1.0版本:SSL issue when scraping website,似乎不起作用
我也尝试Disable SSL certificate verification in Scrapy,但是我不知道如何定义 HttpsDownloaderIgnoreCNError 来禁用SSL验证
有什么想法可以使以下命令起作用?
scrapy shell 'https://www.cnbanbao.cn/'