Question

我正在抓取以下网站https://www.trollandtoad.com/magic-the-gathering/aether-revolt/10066，并且我试图单击“下一步”按钮转到下一页并进行抓取。我已经在其他程序上完成了此操作，因此我只是使用相同的代码，并进行了修改以与当前网站一起使用，但无法正常工作。它只会抓取第一页。


    def parse(self, response):
        for game in response.css('div.card > div.row'):
            item = GameItem()
            item["Category"] = game.css("div.col-12.prod-cat a::text").get()
            item["Card_Name"]  = game.css("a.card-text::text").get()
            for buying_option in game.css('div.buying-options-table div.row:not(:first-child)'):
                item["Seller"] = buying_option.css('div.row.align-center.py-2.m-auto > div.col-3.text-center.p-1 > img::attr(title)').get()
                item["Condition"] = buying_option.css("div.col-3.text-center.p-1::text").get()
                item["Price"] = buying_option.css("div.col-2.text-center.p-1::text").get()
                yield item
            next_page = response.xpath('//a[contains(., "Next Page")]/@href').get()
            # If it exists and there is a next page enter if statement
            if next_page is not None:
                # Go to next page
                yield response.follow(next_page, self.parse)

更新＃1

这是下一个按钮的HTML代码的快照

更新＃2

这是我必须尝试更新的代码，然后转到下一页。仍然无法正常工作，但我想我更接近正确的代码。

next_page = response.xpath('//div[contains(., "Next Page")]/@class').get()
            # If it exists and there is a next page enter if statement
            if next_page is not None:
                # Go to next page
                yield response.follow(next_page, self.parse)

Answer 1

您需要找到下一个页码，然后使用该页码提交表单：

def parse(self, response):

    for game in response.css('div.card > div.row'):
        item = GameItem()
        item["Category"] = game.css("div.col-12.prod-cat a::text").get()
        item["Card_Name"]  = game.css("a.card-text::text").get()
        for buying_option in game.css('div.buying-options-table div.row:not(:first-child)'):
            item["Seller"] = buying_option.css('div.row.align-center.py-2.m-auto > div.col-3.text-center.p-1 > img::attr(title)').get()
            item["Condition"] = buying_option.css("div.col-3.text-center.p-1::text").get()
            item["Price"] = buying_option.css("div.col-2.text-center.p-1::text").get()
            yield item
    next_page_number = response.xpath('//div[div[.="Next Page"]][not(contains(@class, "hide"))]/@data-page').get()
    # If it exists and there is a next page enter if statement
    if next_page_number:
        yield scrapy.FormRequest.from_response(
            response=response,
            formid="category_form",
            formdata={
                'page-no': next_page_number,
            },
            callback=self.parse
        )

单击Scrapy中的下一步按钮

1 个答案: