我正在使用CrawlSpider在Scrapy中构建自己的Spider,并正在使用Configuration类来使其可配置。我的配置具有“ Starting_urls”属性,用于传递给蜘蛛。看起来它已正确传递给了Spider,但我看到了以下错误(请参见第一行)。
2018-10-24 11:41:17 [scrapy.core.scraper] ERROR: Error downloading <GET ['https://[VALID_WEBSITE']>
Traceback (most recent call last):
File "C:\Users\David\.virtualenvs\cluster-pD1pIc9C\lib\site-packages\twisted\internet\defer.py", line 1416, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "C:\Users\David\.virtualenvs\cluster-pD1pIc9C\lib\site-packages\twisted\python\failure.py", line 491, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "C:\Users\David\.virtualenvs\cluster-pD1pIc9C\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
File "C:\Users\David\.virtualenvs\cluster-pD1pIc9C\lib\site-packages\scrapy\utils\defer.py", line 45, in mustbe_deferred
result = f(*args, **kw)
File "C:\Users\David\.virtualenvs\cluster-pD1pIc9C\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 64, in download_request
(scheme, self._notconfigured[scheme]))
scrapy.exceptions.NotSupported: Unsupported URL scheme '': no handler available for that scheme
此网址与以下命令完全相同:
scrapy runspider main.py
所以它一定是我的蜘蛛中的东西,但我不确定是什么
class MainSpider(CrawlSpider):
def __init__(self, configuration):
super(MainSpider, self).__init__(configuration.name)
dispatcher.connect(self.spider_closed, signals.spider_closed)
self.configuration = configuration
self.name = configuration.name
self.allowed_domains = [configuration.allowed_domains]
self.start_urls = [configuration.start_urls]
self.product_link_id = configuration.product_link_id
self.product_links = set()
答案 0 :(得分:0)
看起来像这样几乎总是由于无效的URL。在这种情况下,URL包含额外的单引号。在浏览器中输入时,我只需要简单地传递URL。