Question

我正在尝试从https://www.goodreads.com/quotes中提取引号。看来我只得到第一页，而下一页则不起作用。这是我的代码：

import scrapy

class QuotesSpider(scrapy.Spider):
    name = 'quotes'

    start_urls = [
        'http://www.goodreads.com/quotes'
    ]

    def parse(self,response):

        for quote in response.xpath("//div[@class='quote']"):
            yield {
                'quoteText': quote.xpath(".//div[@class ='quoteText']").extract_first()
            }

        next_page=response.css("a").xpath("@href").extract()
        if next_page is not None:
            next_page_link=response.urljoin(next_page)
            yield scrapy.Request(url=next_page_link, callback= self.parse)

Answer 1

您必须获得下一页链接的href。使用它来获取下一页URL：

next_page=response.css("a.next_page::attr(href)").get()

您可以在此处了解有关选择器的更多信息： https://docs.scrapy.org/en/latest/topics/selectors.html

如何使用scrapy（网页抓取）导航到下一页

1 个答案: