def parse:
for link in LinkExtractor(restrict_xpaths="BLAH",).extract_links(response)[:-1]:
yield Request(link.url)
l = MytemsLoader()
l.add_value('main1', some xpath)
l.add_value('main2', some xpath)
l.add_value('main3', some xpath)
rows = response.xpath("table[@id='BLAH']/tbody[contains(@id, 'BLOB')]")
for row in rows:
l.add_value('table1', some xpath based on rows)
l.add_value('table2', some xpath based on rows)
l.add_value('main3', some xpath based on rows)
yield l.loaditem()
我正在使用一个项目加载器,因为我想预处理这些字段并轻松处理任何空值。 该表的每一行应该是一个具有main1,2,3 ......等字段加上自己的字段的实体。 但是,上面的代码会覆盖l itemloader,只返回每个主页的最后一行。
问题: 如何使用项目加载器将主页面数据与每个表格行条目组合?如果我为每个部分使用了2个项目加载器,那么它们如何组合?
供将来参考:
def newparse:
for link in LinkExtractor(restrict_xpaths="BLAH",).extract_links(response)[:-1]:
yield Request(link.url)
ml = MyitemLoader()
ml.add_value('main1', some xpath)
ml.add_value('main2', some xpath)
ml.add_value('main3', some xpath)
main_item = ml.load_item()
rows = response.xpath("table[@id='BLAH']/tbody[contains(@id, 'BLOB')]")
for row in rows:
bl = MyitemLoader(item=main_item, selector=row)
bl.add_value('table1', some xpath based on row)
bl.add_value('table2', some xpath based on row)
bl.add_value('main3', some xpath based on row)
yield bl.loaditem()
答案 0 :(得分:4)
您需要在循环中实例化一个新的ItemLoader
,并提供item
argument:
l = MytemsLoader()
l.add_value('main1', some xpath)
l.add_value('main2', some xpath)
l.add_value('main3', some xpath)
item = l.loaditem()
rows = response.xpath("table[@id='BLAH']/tbody[contains(@id, 'BLOB')]")
for row in rows:
l = MytemsLoader(item=item)
l.add_value('table1', some xpath based on rows)
l.add_value('table2', some xpath based on rows)
l.add_value('main3', some xpath based on rows)
yield l.loaditem()