Question

我有几个描述对象属性的Item对象

import scrapy


class FullName(scrapy.Item):
    first = scrapy.Field()
    second = scrapy.Field()
    middle = scrapy.Field()

class Physical(scrapy.Item):
    growth = scrapy.Field()
    weight = scrapy.Field()
    hair = scrapy.Field()

我有物品，属于主题。作为字段，我想插入对象的Item属性

class Human(scrapy.Item):
    sex = scrapy.Field()
    age = scrapy.Field()
    physical = <...Physical Item>
    full_name = <...FullName Item>

所以当您将数据导出到具有指定嵌套的结构

时

{
age: 23,
sex: male,
full_name: {first: test, second: test, middle: test}
physical: {growth: 90, height: 190, hair: blonde},
...
}

嵌套可以达到任何深度。

我是用Scrapy做的吗？什么结构的蜘蛛？在关于extending item和loaders的scrapy文档中，我找不到。

或者我选择了错误的工具，我需要手动完成？

UPD。关于蜘蛛。

蜘蛛的结构是什么？如您所知，我们需要将“物理”字段与蜘蛛PhysicalSpider关联，后者传递当前URL。怎么样？请帮我。

class PhysicalSpider(scrapy.Spider):
    name = "physical"

    def parse(self, response):
         item = PhysicalItem()
         item['weight'] = response.xpath('path').extract()
         yield item

class HumanSpider(scrapy.Spider):
    name = "human"
    start_urls = [
        "url1",
        "url2",
     ]

    def parse(self, response):
         item = HumanItem()
         item['sex'] = response.xpath('path').extract()
         item['age'] = response.xpath('path')[1].extract()
         item['physical'] = PhysicalSpider(???)
         yield item

Answer 1

class Human(scrapy.Item):
    sex = scrapy.Field()
    physical = scrapy.Field()
    full_name = scrapy.Field()

class Physical(scrapy.Item):
    height = scrapy.Field() 

p = Physical()
p['height'] = 180
h = Human()
h['physical'] = p
h['sex'] = 'yes'
return h

结果：

{'physical': {'height': 180}, 'sex': 'yes'}

根据您的蜘蛛示例：

class HumanSpider(scrapy.Spider):
    name = "human"
    start_urls = [
        "url1",
     ]

    def parse(self, response):
         item = HumanItem()
         item['sex'] = response.xpath('path').extract()
         item['age'] = response.xpath('path')[1].extract()
         physical_item = Physicalitem()
         physical_item['height'] = response.xpath('path').extract()
         item['physical'] = physical_item
         yield item

Scrapy。在解析结果中创建复杂结构（dict中的dict）

1 个答案: