我有SSL代理服务器,我想剪贴https站点。我的意思是scrapy和代理之间的连接已加密,然后代理将打开与网站的连接。 经过一些调试后,我发现以下内容: 目前scrap抓的情况如下:-
如果网站是http,则使用ScrapyProxyAgent,它向客户端发送问候,然后将对网站的连接请求发送到代理
但是如果站点是https
它使用不向客户端发送客户端问候的TunnelingAgent,因此连接终止。
我需要告诉scrapy首先通过ScrapyProxyAgent建立连接,然后使用TunnelingAgent不确定该怎么做。
我试图创建一个https DOWNLOAD_HANDLERS,但我不是那个专家
class MyHTTPDownloader(HTTP11DownloadHandler):
def download_request(self, request, spider):
"""Return a deferred for the HTTP download"""
timeout = request.meta.get('download_timeout') or self._connectTimeout
bindaddress = request.meta.get('bindaddress')
proxy = request.meta.get('proxy')
agent = ScrapyProxyAgent(reactor,proxyURI=to_bytes(proxy, encoding='ascii'),
connectTimeout=timeout, bindAddress=bindaddress, pool=self._pool)
_, _, proxyHost, proxyPort, proxyParams = _parse(proxy)
proxyHost = to_unicode(proxyHost)
url = urldefrag(request.url)[0]
method = to_bytes(request.method)
headers = TxHeaders(request.headers)
omitConnectTunnel = b'noconnect' in proxyParams
proxyConf = (proxyHost, proxyPort,
request.headers.get(b'Proxy-Authorization', None))
if request.body:
bodyproducer = _RequestBodyProducer(request.body)
if request.body:
bodyproducer = _RequestBodyProducer(request.body)
elif method == b'POST':
bodyproducer = _RequestBodyProducer(b'')
else:
bodyproducer = None
start_time = time()
tunnelingAgent = TunnelingAgent(reactor, proxyConf,
contextFactory=self._contextFactory, connectTimeout=timeout,
bindAddress=bindaddress, pool=self._pool)
agent.request(method, to_bytes(url, encoding='ascii'), headers, bodyproducer)
代理代理连接后,我需要建立一个隧道。 那有可能吗?
提前感谢