所以我在网上学习了一个教程,但内容已经过时了。我不明白为什么它不刮下一页?这是代码。
我还设置了 ROBOTSTXT_OBEY = False
import scrapy
class EcigPageSpider(scrapy.Spider):
name = 'ecig_page'
allowed_domains = ['www.cigabuy.com/']
start_urls = [
'https://www.cigabuy.com/consumer-electronics-c-56_75-pg-1.html'
]
def parse(self, response):
for product in response.xpath("//div[@class='p_box_wrapper']"):
title = product.xpath(".//a[@class='p_box_title']/text()").get()
url = product.xpath(".//a[@class='p_box_title']/@href").get()
discounted_price = product.xpath(".//div[@class='p_box_price cf']/span[1]/text()").get()
original_price = product.xpath(".//div[@class='p_box_price cf']/span[2]/text()").get()
yield {
'title': title,
'url': url,
'discounted_price': discounted_price,
'original_price': original_price,
}
next_page = response.xpath("//a[@class='nextPage']/@href").get()
if next_page:
yield response.follow(url=next_page, callback=self.parse)
我也尝试将 if next page: body 更改为
yield scrapy.Request(url=next_page, callback=self.parse)
还是不行。
答案 0 :(得分:0)
我现在明白了,只是我需要删除这一行中多余的“/”
allowed_domains = ['www.cigabuy.com/']
确保不要包含斜线