针对scrappy.Item的键中具有非常规字符的自定义字段

时间:2014-08-11 18:21:51

标签: python scrapy

假设我要定义一个名为Product的项目模型,其中包含一个名为@type的密钥。

class Product(scrapy.Item):
    name = scrapy.Field()
    price = scrapy.Field()
    stock = scrapy.Field()
    @type = scrapy.Field()

显然,由于@type不是有效的实例变量名,因此以下定义在python中是非法的。

仍然有这样的JSON是有效的:

{
  name: "Battery",
  price: 1.00,
  stock: 10,
  @type: "Product"
}

有没有人知道如何在Scrapy中正确地做到这一点?

1 个答案:

答案 0 :(得分:0)

由于scrapy.Item基于dict并在[{3}}中存储字段,因此覆盖__init__()

class Product(scrapy.Item):
    name = scrapy.Field()
    price = scrapy.Field()
    stock = scrapy.Field()

    def __init__(self, *args, **kwargs):
        super(Product, self).__init__(*args, **kwargs)
        self.fields['@type'] = Field()

示例蜘蛛:

from scrapy import Item, Field
from scrapy import Spider


class Product(Item):
    name = Field()
    price = Field()
    stock = Field()

    def __init__(self, *args, **kwargs):
        super(Product, self).__init__(*args, **kwargs)
        self.fields['@type'] = Field()


class ProductSpider(Spider):
    name = "product_spider"  
    start_urls = ['http://google.com']

    def parse(self, response):
        item = Product()
        item['name'] = 'Test name'
        item['price'] = 0
        item['stock'] = True
        item['@type'] = 'Test type'
        return item

产地:

$ scrapy runspider spider1.py
2014-08-11 14:32:00-0400 [product_spider] DEBUG: Scraped from <200 http://www.google.com/>
{'@type': 'Test type', 'name': 'Test name', 'price': 0, 'stock': True}