Question

https://doc.scrapy.org/en/latest/topics/loaders.html#using-item-loaders-to-populate-items

from scrapy.loader import ItemLoader
from myproject.items import Product

def parse(self, response):
    l = ItemLoader(item=Product(), response=response)
    l.add_xpath('name', '//div[@class="product_name"]')
    l.add_xpath('name', '//div[@class="product_title"]')
    l.add_xpath('price', '//p[@id="price"]')
    l.add_css('stock', 'p#stock]')
    l.add_value('last_updated', 'today') # you can also use literal values
    return l.load_item()

但是如果我从网页2的名称，价格等获得信息，如何将其添加到l.load_item()？

因为我添加了循环，但是如果最后我写了return循环将只工作一次。
如何正确地做到这一点？

Answer 1

只需将return l.load_item()替换为yield l.load_item()

示例：

for block in response.css('.blocks'):
    product_name = block.css('div.product_name').extract_first()
    product_title = block.css('div.product_title').extract_first()
    price = block.css('p#price').extract_first()
    stock = block.css('p#stock').extract_first()
    yield Product(
        product_name=product_name,
        product_title=product_title,
        price=price,
        stock=stock,
        last_updated='today'
    )

如果使用ItemLoader，则必须为每次迭代重新加载变量

for block in blocks:
    l = ItemLoader(item=Product(), response=response)
    ...
    yield l.load_item()

“在回调函数中，您解析响应（网页），并返回包含提取数据，Item对象，Request对象或这些对象的可迭代对象的字典。 See scrapy documentation

yield用于生成Product Item对象的可迭代对象

在Scrapy ItemLoader中循环

1 个答案: