scrapy.Spider子类,无法调用实例方法

时间:2020-07-29 04:15:11

标签: python scrapy

首先,我是新python和网络抓取世界。 我只想在Spider子类中调用实例方法/函数。

代码:

import scrapy


class QuotesSpider(scrapy.Spider):
    name = 'quotes'
    start_urls = [
        'http://quotes.toscrape.com/tag/humor/',
    ]

    def parse(self, response):
        for quote in response.css('div.quote'):
            yield {
                'author': quote.xpath('span/small/text()').get(),
                'text': quote.css('span.text::text').get(),
            }
        print("*** call next page function")
        self.parse_next_page(response)    

    def parse_next_page(self, response):
        print("*** parsee next page function invoked")
        next_page = response.css('li.next a::attr("href")').get()
        if next_page is not None:
            yield response.follow(next_page, self.parse)

输出:

2020-07-29 09:30:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/humor/> (referer: None)
2020-07-29 09:30:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/tag/humor/>
{'author': 'Jane Austen', 'text': '“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”'}
2020-07-29 09:30:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/tag/humor/>
{'author': 'Steve Martin', 'text': '“A day without sunshine is like, you know, night.”'}
2020-07-29 09:30:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/tag/humor/>
{'author': 'Garrison Keillor', 'text': '“Anyone who thinks sitting in church can make you a Christian must also think that sitting in a garage can make you a car.”'}
2020-07-29 09:30:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/tag/humor/>
{'author': 'Jim Henson', 'text': '“Beauty is in the eye of the beholder and it may be necessary from time to time to give a stupid or misinformed beholder a black eye.”'}
2020-07-29 09:30:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/tag/humor/>
{'author': 'Charles M. Schulz', 'text': "“All you need is love. But a little chocolate now and then doesn't hurt.”"}
2020-07-29 09:30:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/tag/humor/>
{'author': 'Suzanne Collins', 'text': "“Remember, we're madly in love, so it's all right to kiss me anytime you feel like it.”"}
2020-07-29 09:30:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/tag/humor/>
{'author': 'Charles Bukowski', 'text': '“Some people never go crazy. What truly horrible lives they must lead.”'}
2020-07-29 09:30:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/tag/humor/>
{'author': 'Terry Pratchett', 'text': '“The trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it.”'}
2020-07-29 09:30:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/tag/humor/>
{'author': 'Dr. Seuss', 'text': '“Think left and think right and think low and think high. Oh, the thinks you can think up if only you try!”'}
2020-07-29 09:30:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/tag/humor/>
{'author': 'George Carlin', 'text': '“The reason I talk to myself is because I’m the only one whose answers I accept.”'}
*** call next page funcion
2020-07-29 09:30:39 [scrapy.core.engine] INFO: Closing spider (finished)

您可能会看到实例方法“ parse_next_page”没有被调用。请让我知道我在这里做错了什么。

1 个答案:

答案 0 :(得分:1)

尝试进行分页。对我来说很好

def parse(self, response):
    for quote in response.css('div.quote'):
        yield {
            'author': quote.xpath('span/small/text()').get(),
            'text': quote.css('span.text::text').get(),
        }
    next_page = response.css('li.next a::attr("href")').get()
    if next_page is not None:
        yield response.follow(next_page, self.parse)