Question

所以我在网上学习了一个教程，但内容已经过时了。我不明白为什么它不刮下一页？这是代码。

我还设置了 ROBOTSTXT_OBEY = False

import scrapy


class EcigPageSpider(scrapy.Spider):
    name = 'ecig_page'
    allowed_domains = ['www.cigabuy.com/']
    start_urls = [
        'https://www.cigabuy.com/consumer-electronics-c-56_75-pg-1.html'
    ]

    def parse(self, response):
        for product in response.xpath("//div[@class='p_box_wrapper']"):
            title = product.xpath(".//a[@class='p_box_title']/text()").get()
            url = product.xpath(".//a[@class='p_box_title']/@href").get()
            discounted_price = product.xpath(".//div[@class='p_box_price cf']/span[1]/text()").get()
            original_price = product.xpath(".//div[@class='p_box_price cf']/span[2]/text()").get()

            yield {
                'title': title,
                'url': url,
                'discounted_price': discounted_price,
                'original_price': original_price,
            }

        next_page = response.xpath("//a[@class='nextPage']/@href").get()

        if next_page:
            yield response.follow(url=next_page, callback=self.parse)

我也尝试将 if next page: body 更改为

yield scrapy.Request(url=next_page, callback=self.parse)

还是不行。

Answer 1

我现在明白了，只是我需要删除这一行中多余的“/”

allowed_domains = ['www.cigabuy.com/']

确保不要包含斜线

不遵循下一页链接的scrapy分页

1 个答案: