Question

我对scrapy很新，我知道项目用于填充已删除的数据，但我无法理解项目和项目加载器之间的区别。我试着阅读一些示例代码，他们使用项目加载器来存储而不是项目，我无法理解为什么。 Scrapy文档对我来说不够清楚。任何人都可以给出一个简单的解释（更好的例子）关于何时使用物品装载机以及它们提供的物品附加设施？

Answer 1

我非常喜欢文档中的官方解释：

项目加载器提供了一种方便的填充方法   项目。即使可以使用自己的项目填充项目   类似字典的API，Item Loaders提供了更方便的API   通过自动化一些常见的方法，从刮擦过程中填充它们   在分配之前解析原始提取数据的任务。

换句话说，Items提供了抓取数据的容器，而   Item Loaders提供了填充该容器的机制。

最后一段应该回答你的问题项目加载器非常棒，因为它们允许您拥有如此多的处理快捷方式，并重用一堆代码来保持一切整洁，清晰和易懂。

比较示例案例。让我们说我们想要抓住这个项目：

class MyItem(Item):
    full_name = Field()
    bio = Field()
    age = Field()
    weight = Field()
    height = Field()

仅限物品的方法看起来像这样：

def parse(self, response):
    full_name = response.xpath("//div[contains(@class,'name')]/text()").extract()
    # i.e. returns ugly ['John\n', '\n\t  ', '  Snow']
    item['full_name'] = ' '.join(i.strip() for i in full_name if i.strip())
    bio = response.xpath("//div[contains(@class,'bio')]/text()").extract()
    item['bio'] = ' '.join(i.strip() for i in full_name if i.strip())
    age = response.xpath("//div[@class='age']/text()").extract_first(0)
    item['age'] = int(age) 
    weight = response.xpath("//div[@class='weight']/text()").extract_first(0)
    item['weight'] = int(age) 
    height = response.xpath("//div[@class='height']/text()").extract_first(0)
    item['height'] = int(age) 
    return item

vs Item Loaders方法：

# define once in items.py 
from scrapy.loader.processors import Compose, MapCompose, Join, TakeFirst
clean_text = Compose(MapCompose(lambda v: v.strip()), Join())   
to_int = Compose(TakeFirst(), int)

class MyItemLoader(ItemLoader):
    default_item_class = MyItem
    full_name_out = clean_text
    bio_out = clean_text
    age_out = to_int
    weight_out = to_int
    height_out = to_int

# parse as many different places and times as you want  
def parse(self, response):
    loader = MyItemLoader(selector=response)
    loader.add_xpath('full_name', "//div[contains(@class,'name')]/text()")
    loader.add_xpath('bio', "//div[contains(@class,'bio')]/text()")
    loader.add_xpath('age', "//div[@class='age']/text()")
    loader.add_xpath('weight', "//div[@class='weight']/text()")
    loader.add_xpath('height', "//div[@class='height']/text()")
    return loader.load_item()

正如您所看到的，物品装载器更清洁，更容易扩展。假设您还有20个字段，其中很多字段共享相同的处理逻辑，如果没有项目加载程序就会自杀。物品装载机很棒，你应该使用它们！

scrapy中的项目与项目加载器

1 个答案: