Scrapy和Tor / Privoxy无法抓取[连接被拒61]

时间:2017-09-10 14:31:28

标签: proxy scrapy tor polipo privoxy

我的midwares.py中有here提到的以下代码,我试图在每次请求的TOR中更改我的IP

def _set_new_ip():
    with Controller.from_port(port=9051) as controller:
        controller.authenticate(password='tor_password')
        controller.signal(Signal.NEWNYM)

class RandomUserAgentMiddleware(object):
    def process_request(self, request, spider):
        ua  = random.choice(settings.get('USER_AGENT_LIST'))
        if ua:
            request.headers.setdefault('User-Agent', ua)

class ProxyMiddleware(object):
    def process_request(self, request, spider):
        _set_new_ip()
        request.meta['proxy'] = 'http://127.0.0.1:8118'
        spider.log('Proxy : %s' % request.meta['proxy'])

但是当我尝试在scrapy中开始爬行时,它会不断回复我:

2017-09-10 22:36:44 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-09-10 22:36:44 [stem] DEBUG: GETCONF __owningcontrollerprocess (runtime: 0.0004)
2017-09-10 22:36:44 [stem] INFO: Error while receiving a control message (SocketClosed): empty socket content
2017-09-10 22:36:44 [IT] DEBUG: Proxy : http://127.0.0.1:8118
2017-09-10 22:36:44 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.jobstreet.com.sg/en/job-search/job-vacancy.php?ojs=10&key=information%20technology> (failed 1 times): Connection was refused by other side: 61: Connection refused.
2017-09-10 22:36:44 [stem] DEBUG: GETCONF __owningcontrollerprocess (runtime: 0.0003)
2017-09-10 22:36:44 [stem] INFO: Error while receiving a control message (SocketClosed): empty socket content
2017-09-10 22:36:44 [IT] DEBUG: Proxy : http://127.0.0.1:8118
2017-09-10 22:36:52 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.jobstreet.com.sg/en/job-search/job-vacancy.php?ojs=10&key=information%20technology> (failed 2 times): Connection was refused by other side: 61: Connection refused.
2017-09-10 22:36:52 [stem] DEBUG: GETCONF __owningcontrollerprocess (runtime: 0.0004)
2017-09-10 22:36:52 [stem] INFO: Error while receiving a control message (SocketClosed): empty socket content
2017-09-10 22:36:52 [IT] DEBUG: Proxy : http://127.0.0.1:8118
2017-09-10 22:36:56 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://www.jobstreet.com.sg/en/job-search/job-vacancy.php?ojs=10&key=information%20technology> (failed 3 times): Connection was refused by other side: 61: Connection refused.
2017-09-10 22:36:56 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.jobstreet.com.sg/en/job-search/job-vacancy.php?ojs=10&key=information%20technology>: Connection was refused by other side: 61: Connection refused.
2017-09-10 22:36:56 [scrapy.core.engine] INFO: Closing spider (finished)

0 个答案:

没有答案