我正在尝试从网站上删除多个页面。 为此,我使用循环来添加我的URL的页码。 当我发射蜘蛛时,我遇到了这个错误。 我把我的报价从一个单词改为双,或检查我是否有空格,但URL似乎很好。
你知道出了什么问题吗?
我的循环
> def url_lister():
> url_list = []
> page_count = 0
> while page_count < 2:
> url = "['https://www.active.com/running?page=%s" %page_count + "']"
> url_list.append(url)
> print(url)
> page_count += 1
> return url_list
以下是结果网址
['https://www.active.com/running?page=0']
-----------------------------
['https://www.active.com/running?page=1']
-----------------------------
["['https://www.active.com/running?page=0']", "['https://www.active.com/running?page=1']"]
-----------------------------
以下是邮件错误
2018-01-23 14:31:34 [scrapy.middleware] INFO: Enabled item pipelines:
['ACTIVE.pipelines.ActivePipeline']
2018-01-23 14:31:34 [scrapy.core.engine] INFO: Spider opened
2018-01-23 14:31:34 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-01-23 14:31:34 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-01-23 14:31:34 [scrapy.downloadermiddlewares.robotstxt] ERROR: Error downloading <GET :///robots.txt>: Unsupported URL scheme '': no handler available for that scheme
答案 0 :(得分:0)
经过多次测试后,我改变了我的代码并且工作正常:
旧代码
def url_lister():
url_list = []
page_count = 0
while page_count < 2:
url = "['https://www.active.com/running?page=%s" %page_count + "']"
url_list.append(url)
print(url)
page_count += 1
return url_list
新代码
def url_lister():
url_list = []
page_count = 0
while page_count < 480:
url = 'https://www.active.com/running?page=%s' %page_count
url_list.append(url)
page_count += 1
return url_list