Scrapy:下载<get ['https:=“” [valid_website'] =“”>时出错

时间:2018-10-24 15:48:18

标签: python python-3.x scrapy

我正在使用CrawlSpider在Scrapy中构建自己的Spider,并正在使用Configuration类来使其可配置。我的配置具有“ Starting_urls”属性,用于传递给蜘蛛。看起来它已正确传递给了Spider,但我看到了以下错误(请参见第一行)。

2018-10-24 11:41:17 [scrapy.core.scraper] ERROR: Error downloading <GET ['https://[VALID_WEBSITE']>
Traceback (most recent call last):
  File "C:\Users\David\.virtualenvs\cluster-pD1pIc9C\lib\site-packages\twisted\internet\defer.py", line 1416, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "C:\Users\David\.virtualenvs\cluster-pD1pIc9C\lib\site-packages\twisted\python\failure.py", line 491, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "C:\Users\David\.virtualenvs\cluster-pD1pIc9C\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
File "C:\Users\David\.virtualenvs\cluster-pD1pIc9C\lib\site-packages\scrapy\utils\defer.py", line 45, in mustbe_deferred
result = f(*args, **kw)
File "C:\Users\David\.virtualenvs\cluster-pD1pIc9C\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 64, in download_request
(scheme, self._notconfigured[scheme]))
scrapy.exceptions.NotSupported: Unsupported URL scheme '': no handler available for that scheme

此网址与以下命令完全相同:

scrapy runspider main.py

所以它一定是我的蜘蛛中的东西,但我不确定是什么

class MainSpider(CrawlSpider):
def __init__(self, configuration):
    super(MainSpider, self).__init__(configuration.name)
    dispatcher.connect(self.spider_closed, signals.spider_closed)
    self.configuration = configuration
    self.name = configuration.name
    self.allowed_domains = [configuration.allowed_domains]
    self.start_urls = [configuration.start_urls]
    self.product_link_id = configuration.product_link_id
    self.product_links = set()

1 个答案:

答案 0 :(得分:0)

看起来像这样几乎总是由于无效的URL。在这种情况下,URL包含额外的单引号。在浏览器中输入时,我只需要简单地传递URL。