我在尝试测试scrapy安装时遇到错误:
$ scrapy shell http://www.google.es
j2011-02-16 10:54:46+0100 [scrapy] INFO: Scrapy 0.12.0.2536 started (bot: scrapybot)
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled extensions: TelnetConsole, SpiderContext, WebService, CoreStats, MemoryUsage, CloseSpider
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled scheduler middlewares: DuplicatesFilterMiddleware
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpProxyMiddleware, HttpCompressionMiddleware, DownloaderStats
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled item pipelines:
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2011-02-16 10:54:46+0100 [default] INFO: Spider opened
2011-02-16 10:54:47+0100 [default] DEBUG: Retrying <GET http://www.google.es> (failed 1 times): Connection was refused by other side: 111: Connection refused.
2011-02-16 10:54:47+0100 [default] DEBUG: Retrying <GET http://www.google.es> (failed 2 times): Connection was refused by other side: 111: Connection refused.
2011-02-16 10:54:47+0100 [default] DEBUG: Discarding <GET http://www.google.es> (failed 3 times): Connection was refused by other side: 111: Connection refused.
2011-02-16 10:54:47+0100 [default] ERROR: Error downloading <http://www.google.es>: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionRefusedError'>: Connection was refused by other side: 111: Connection refused.
]
2011-02-16 10:54:47+0100 [scrapy] ERROR: Shell error
Traceback (most recent call last):
Failure: scrapy.exceptions.IgnoreRequest: Connection was refused by other side: 111: Connection refused.
2011-02-16 10:54:47+0100 [default] INFO: Closing spider (shutdown)
2011-02-16 10:54:47+0100 [default] INFO: Spider closed (shutdown)
版本:
答案 0 :(得分:8)
任务1: Scrapy会发送一个带有'bot'的usergent。网站也可能会根据用户代理进行阻止。
尝试在settings.py
中覆盖USER_AGENT例如:USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.7'
任务2: 尝试在请求之间给出延迟,以欺骗人发送请求。
DOWNLOAD_DELAY = 0.25
任务3: 如果无效,请安装wireshark并查看请求标头(或)发布数据的差异 scrapy发送时以及浏览器发送时。
答案 1 :(得分:1)
您的网络连接可能存在问题。
首先,检查您的互联网连接。
如果您通过代理服务器访问网络,您应该将一段代码放入您的scrapy项目中(http://doc.scrapy.org/en/latest/topics/downloader-middleware.html#scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware)
无论如何,请尝试升级您的scrapy版本。
答案 2 :(得分:0)
我也遇到了这个错误。原来是我访问的端口被防火墙阻止了。我的服务器默认情况下会阻止端口,除非将其列入白名单。