我正在尝试创建一个简单的抓取程序,以抓取来自网站http://quotes.toscrape.com/
的报价。输出应存储在html文件中。但是,当我运行代码时,它不会放任何东西。终端显示它已爬网0页
Terminal output
这是下面的代码。你能帮我解决一些问题吗谢谢
import scrapy
class SimpleSpider(scrapy.Spider):
name ="SimpleSpider"
def start_request(self):
urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
]
for url in urls:
yield scrapy.Request(url, self.parse)
def parse(self, response):
page = response.url.split('/')[-1]
filename = 'quotes-%s.html' % page
with open(filename, 'wb') as f:
f.write(response.body)
self.log('Files saved to %s' % filename)
答案 0 :(得分:1)
我想这只是一个命名问题。
使用start_requests
代替start_request
。
请参阅:https://docs.scrapy.org/en/latest/topics/spiders.html#scrapy.spiders.Spider.start_requests
答案 1 :(得分:0)
请检查此内容,它对我有用。请遵循标准的刮scrap蜘蛛。
import scrapy
class SimpleSpider(scrapy.Spider):
name = "SimpleSpider"
start_urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
]
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(url, self.parse)
def parse(self, response):
page = response.url.split('/')[-1]
filename = 'quotes-%s.html' % page
with open(filename, 'wb') as f:
f.write(response.body)
self.log('Files saved to %s' % filename)