我想刮掉for循环中的链接,在for循环中有项目,我将项目传递给回调函数。但是为什么回调函数中的项具有相同的值。这是我的代码。
import scrapy
import re
from scraper.product_items import Product
class ProductSpider(scrapy.Spider):
name = "productspider"
start_urls = [
'http://www.website.com/category-page/',
]
def parse(self, response):
item = Product()
for products in response.css("div.product-card"):
link = products.css("a::attr(href)").extract_first()
item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()
item['price'] = products.css("div.product-card__old-price::text").extract_first()
yield scrapy.Request(url = link, callback=self.parse_product_page, meta={'item': item})
def parse_product_page(self, response):
item = response.meta['item']
item['image'] = response.css("div.productImage::attr(data-big)").extract_first()
return item
结果就是这样。
[
{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image1.jpg"},
{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image2.jpg"},
{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image3.jpg"},
]
如您所见,每次迭代的sku和price具有相同的值。我想要sku的结果和价格不同。如果我得到自解析的结果,请更改这样的代码。
import scrapy
import re
from scraper.product_items import Product
class LazadaSpider(scrapy.Spider):
name = "lazada"
start_urls = [
'http://www.lazada.co.id/beli-jam-tangan-kasual-pria/',
]
def parse(self, response):
item = Product()
for products in response.css("div.product-card"):
link = products.css("a::attr(href)").extract_first()
item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()
item['price'] = products.css("div.product-card__old-price::text").extract_first()
yield item
然后每次迭代的sku和价格值都是正确的。
[
{"sku": "CA199FA31FKAANID", "price": "299"},
{"sku": "SW437OTAA31QO3ANID", "price": "200"},
{"sku": "SW437OTAM1RAANID", "price": "235"},
]
答案 0 :(得分:1)
您应该在for
循环内创建项目,否则您只需在重新填充其值的所有迭代之间共享相同的项目。所以正确的代码是:
def parse(self, response):
for products in response.css("div.product-card"):
item = Product()
link = products.css("a::attr(href)").extract_first()
item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()
item['price'] = products.css("div.product-card__old-price::text").extract_first()
yield item