Scrapy - 为什么循环中的项目在另一个解析器中访问时具有相同的值

时间:2017-01-21 11:02:23

标签: scrapy

我想刮掉for循环中的链接,在for循环中有项目,我将项目传递给回调函数。但是为什么回调函数中的项具有相同的值。这是我的代码。

import scrapy
import re
from scraper.product_items import Product

class ProductSpider(scrapy.Spider):
    name = "productspider"

    start_urls = [
        'http://www.website.com/category-page/',
    ]

    def parse(self, response):
        item = Product()
        for products in response.css("div.product-card"):
            link = products.css("a::attr(href)").extract_first()
            item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()
            item['price'] = products.css("div.product-card__old-price::text").extract_first()
            yield scrapy.Request(url = link, callback=self.parse_product_page, meta={'item': item})

    def parse_product_page(self, response):
        item = response.meta['item']
        item['image'] = response.css("div.productImage::attr(data-big)").extract_first()
        return item

结果就是这样。

[
{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image1.jpg"},
{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image2.jpg"},
{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image3.jpg"},
]

如您所见,每次迭代的sku和price具有相同的值。我想要sku的结果和价格不同。如果我得到自解析的结果,请更改这样的代码。

import scrapy
import re
from scraper.product_items import Product

class LazadaSpider(scrapy.Spider):
    name = "lazada"

    start_urls = [
        'http://www.lazada.co.id/beli-jam-tangan-kasual-pria/',
    ]

    def parse(self, response):
        item = Product()
        for products in response.css("div.product-card"):
            link = products.css("a::attr(href)").extract_first()
            item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()
            item['price'] = products.css("div.product-card__old-price::text").extract_first()
            yield item

然后每次迭代的sku和价格值都是正确的。

[
{"sku": "CA199FA31FKAANID", "price": "299"},
{"sku": "SW437OTAA31QO3ANID", "price": "200"},
{"sku": "SW437OTAM1RAANID", "price": "235"},
]

1 个答案:

答案 0 :(得分:1)

您应该在for循环内创建项目,否则您只需在重新填充其值的所有迭代之间共享相同的项目。所以正确的代码是:

def parse(self, response):
    for products in response.css("div.product-card"):
        item = Product()
        link = products.css("a::attr(href)").extract_first()
        item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()
        item['price'] = products.css("div.product-card__old-price::text").extract_first()
        yield item