Scrapy不会抓取网址

时间:2020-09-05 10:18:54

标签: python scrapy

我正在尝试创建一个简单的抓取程序,以抓取来自网站http://quotes.toscrape.com/的报价。输出应存储在html文件中。但是,当我运行代码时,它不会放任何东西。终端显示它已爬网0页 Terminal output

这是下面的代码。你能帮我解决一些问题吗谢谢

import scrapy

class SimpleSpider(scrapy.Spider):
    name ="SimpleSpider"
    
    def start_request(self):
        urls = [
            'http://quotes.toscrape.com/page/1/',
            'http://quotes.toscrape.com/page/2/',
            ]
        
        for url in urls:
            yield scrapy.Request(url, self.parse)
            
    def parse(self, response):
        page = response.url.split('/')[-1]
        filename = 'quotes-%s.html' % page
        with open(filename, 'wb') as f:
            f.write(response.body)
        self.log('Files saved to %s' % filename)

2 个答案:

答案 0 :(得分:1)

我想这只是一个命名问题。 使用start_requests代替start_request

请参阅:https://docs.scrapy.org/en/latest/topics/spiders.html#scrapy.spiders.Spider.start_requests

答案 1 :(得分:0)

请检查此内容,它对我有用。请遵循标准的刮scrap蜘蛛。

import scrapy


class SimpleSpider(scrapy.Spider):
    name = "SimpleSpider"
    start_urls = [
        'http://quotes.toscrape.com/page/1/',
        'http://quotes.toscrape.com/page/2/',
    ]

    def start_requests(self):

        for url in self.start_urls:
            yield scrapy.Request(url, self.parse)

    def parse(self, response):
        page = response.url.split('/')[-1]
        filename = 'quotes-%s.html' % page
        with open(filename, 'wb') as f:
            f.write(response.body)
        self.log('Files saved to %s' % filename)
相关问题