我有几个描述对象属性的Item对象
import scrapy
class FullName(scrapy.Item):
first = scrapy.Field()
second = scrapy.Field()
middle = scrapy.Field()
class Physical(scrapy.Item):
growth = scrapy.Field()
weight = scrapy.Field()
hair = scrapy.Field()
我有物品,属于主题。作为字段,我想插入对象的Item属性
class Human(scrapy.Item):
sex = scrapy.Field()
age = scrapy.Field()
physical = <...Physical Item>
full_name = <...FullName Item>
所以当您将数据导出到具有指定嵌套的结构
时{
age: 23,
sex: male,
full_name: {first: test, second: test, middle: test}
physical: {growth: 90, height: 190, hair: blonde},
...
}
嵌套可以达到任何深度。
我是用Scrapy做的吗?什么结构的蜘蛛?在关于extending item和loaders的scrapy文档中,我找不到。或者我选择了错误的工具,我需要手动完成?
UPD。关于蜘蛛。
蜘蛛的结构是什么?如您所知,我们需要将“物理”字段与蜘蛛PhysicalSpider关联,后者传递当前URL。怎么样?请帮我。
class PhysicalSpider(scrapy.Spider):
name = "physical"
def parse(self, response):
item = PhysicalItem()
item['weight'] = response.xpath('path').extract()
yield item
class HumanSpider(scrapy.Spider):
name = "human"
start_urls = [
"url1",
"url2",
]
def parse(self, response):
item = HumanItem()
item['sex'] = response.xpath('path').extract()
item['age'] = response.xpath('path')[1].extract()
item['physical'] = PhysicalSpider(???)
yield item
答案 0 :(得分:1)
class Human(scrapy.Item):
sex = scrapy.Field()
physical = scrapy.Field()
full_name = scrapy.Field()
class Physical(scrapy.Item):
height = scrapy.Field()
p = Physical()
p['height'] = 180
h = Human()
h['physical'] = p
h['sex'] = 'yes'
return h
结果:
{'physical': {'height': 180}, 'sex': 'yes'}
根据您的蜘蛛示例:
class HumanSpider(scrapy.Spider):
name = "human"
start_urls = [
"url1",
]
def parse(self, response):
item = HumanItem()
item['sex'] = response.xpath('path').extract()
item['age'] = response.xpath('path')[1].extract()
physical_item = Physicalitem()
physical_item['height'] = response.xpath('path').extract()
item['physical'] = physical_item
yield item