Scrapy代理ip不能与https一起使用,返回'ssl握手失败'

时间:2016-03-28 19:42:38

标签: python scrapy twisted scrapy-spider pyopenssl

Scrapy与我的代理ip一起用于http请求,但不用于http s 请求。

我知道我的代理IP正在使用http,因为我通过向http://ipinfo.io/ip发送请求来测试它:

2016-03-28 12:10:42 [scrapy] DEBUG: Crawled (200) <GET http://ipinfo.io/ip> (referer: http://www.google.com)
2016-03-28 12:10:42 [root] INFO:  *** TEST, WHAT IS MY IP: ***
107.183.7.XX

由于此错误消息,我知道它无法使用https请求:

2016-03-28 12:10:55 [scrapy] DEBUG: Gave up retrying <GET https://www.my-company-url.com> (failed 3 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl23_read', 'ssl handshake failure')]>]

我的settings.py包含:

DOWNLOADER_MIDDLEWARES = {
    'crystalball.middlewares.ProxyMiddleware': 100,
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110
}

我的crystalball.middlewares.ProxyMiddleware包含:

import base64

class ProxyMiddleware(object):

    def process_request(self, request, spider):
        request.meta['proxy'] = "https://107.183.X.XX:55555"
        proxy_user_pass = "hXXbp3:LitSwDXX99"
        encoded_user_pass = base64.encodestring(proxy_user_pass)
        request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass

关于我接下来应该尝试什么的任何建议?

旁注:此SO帖子上的解决方案无效:Scrapy and proxies

1 个答案:

答案 0 :(得分:4)

罪魁祸首是base64.encodestring(),它会在请求的Proxy-Authorization标头的值中添加一个不需要的新行\n字符。

解决方案只是strip()关闭\n

更改此行:

request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass

对此:

request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass.strip()