我正在抓取以下网站https://www.trollandtoad.com/magic-the-gathering/aether-revolt/10066,并且我试图单击“下一步”按钮转到下一页并进行抓取。我已经在其他程序上完成了此操作,因此我只是使用相同的代码,并进行了修改以与当前网站一起使用,但无法正常工作。它只会抓取第一页。
def parse(self, response):
for game in response.css('div.card > div.row'):
item = GameItem()
item["Category"] = game.css("div.col-12.prod-cat a::text").get()
item["Card_Name"] = game.css("a.card-text::text").get()
for buying_option in game.css('div.buying-options-table div.row:not(:first-child)'):
item["Seller"] = buying_option.css('div.row.align-center.py-2.m-auto > div.col-3.text-center.p-1 > img::attr(title)').get()
item["Condition"] = buying_option.css("div.col-3.text-center.p-1::text").get()
item["Price"] = buying_option.css("div.col-2.text-center.p-1::text").get()
yield item
next_page = response.xpath('//a[contains(., "Next Page")]/@href').get()
# If it exists and there is a next page enter if statement
if next_page is not None:
# Go to next page
yield response.follow(next_page, self.parse)
更新#1
这是下一个按钮的HTML代码的快照
更新#2
这是我必须尝试更新的代码,然后转到下一页。仍然无法正常工作,但我想我更接近正确的代码。
next_page = response.xpath('//div[contains(., "Next Page")]/@class').get()
# If it exists and there is a next page enter if statement
if next_page is not None:
# Go to next page
yield response.follow(next_page, self.parse)
答案 0 :(得分:1)
您需要找到下一个页码,然后使用该页码提交表单:
def parse(self, response):
for game in response.css('div.card > div.row'):
item = GameItem()
item["Category"] = game.css("div.col-12.prod-cat a::text").get()
item["Card_Name"] = game.css("a.card-text::text").get()
for buying_option in game.css('div.buying-options-table div.row:not(:first-child)'):
item["Seller"] = buying_option.css('div.row.align-center.py-2.m-auto > div.col-3.text-center.p-1 > img::attr(title)').get()
item["Condition"] = buying_option.css("div.col-3.text-center.p-1::text").get()
item["Price"] = buying_option.css("div.col-2.text-center.p-1::text").get()
yield item
next_page_number = response.xpath('//div[div[.="Next Page"]][not(contains(@class, "hide"))]/@data-page').get()
# If it exists and there is a next page enter if statement
if next_page_number:
yield scrapy.FormRequest.from_response(
response=response,
formid="category_form",
formdata={
'page-no': next_page_number,
},
callback=self.parse
)