我试图复制以下scrapy教程 - http://blog.florian-hopf.de/2014/07/scrapy-and-elasticsearch.html。
运行蜘蛛时出现以下错误跟踪 -
[scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-04-16 14:00:41 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-04-16 14:00:41 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://www.meetup.com/robots.txt> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
2018-04-16 14:00:41 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://www.meetup.com/robots.txt> (failed 2 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
2018-04-16 14:00:41 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://www.meetup.com/robots.txt> (failed 3 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
2018-04-16 14:00:41 [scrapy.downloadermiddlewares.robotstxt] ERROR: Error downloading <GET http://www.meetup.com/robots.txt>: [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
Traceback (most recent call last):
File "D:\anaconda\lib\site-packages\twisted\internet\defer.py", line 1384, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "D:\anaconda\lib\site-packages\twisted\python\failure.py", line 393, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "D:\anaconda\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
有人可以帮助我解决/理解这个问题。
答案 0 :(得分:0)
好像你无法到达目的地地址,输出是什么:wget http://www.meetup.com/robots.txt?
无论如何,当您的聚会已升级到https时,您正尝试访问http端点,尝试将start_urls更改为https,例如:
start_urls = [
"https://www.meetup.com/Search-Meetup-Karlsruhe/"
]