Scrapy - 无法点击网页抓取的网址

时间:2018-04-16 08:54:29

标签: python url web-scraping scrapy

我试图复制以下scrapy教程 - http://blog.florian-hopf.de/2014/07/scrapy-and-elasticsearch.html

运行蜘蛛时出现以下错误跟踪 -

[scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min),     scraped 0 items (at 0 items/min)
2018-04-16 14:00:41 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-04-16 14:00:41 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying  <GET http://www.meetup.com/robots.txt> (failed 1 times):   [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
2018-04-16 14:00:41 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://www.meetup.com/robots.txt> (failed 2 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
2018-04-16 14:00:41 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://www.meetup.com/robots.txt> (failed 3 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
2018-04-16 14:00:41 [scrapy.downloadermiddlewares.robotstxt] ERROR: Error downloading <GET http://www.meetup.com/robots.txt>: [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
Traceback (most recent call last):
  File "D:\anaconda\lib\site-packages\twisted\internet\defer.py", line 1384, in _inlineCallbacks
  result = result.throwExceptionIntoGenerator(g)
  File "D:\anaconda\lib\site-packages\twisted\python\failure.py", line 393, in throwExceptionIntoGenerator
  return g.throw(self.type, self.value, self.tb)
  File "D:\anaconda\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))

有人可以帮助我解决/理解这个问题。

1 个答案:

答案 0 :(得分:0)

好像你无法到达目的地地址,输出是什么:wget http://www.meetup.com/robots.txt

无论如何,当您的聚会已升级到https时,您正尝试访问http端点,尝试将start_urls更改为https,例如:

start_urls = [
    "https://www.meetup.com/Search-Meetup-Karlsruhe/"
]