scrapy Itemloader:如何使用Item()/ dict加载

时间:2014-12-14 16:07:24

标签: python scrapy

如果我已经填充了一个Item(),我将如何加载一个项目加载器?

例如

from scrapy import Item, ItemLoader, Field

class MyLoader(ItemLoader):
    desc_out = MapCompose(unicode.strip)

class MyItem(Item):
    desc = Field()


item = MyItem()
item['desc'] = "\r\t\n        some text            \t\n"
loader = MyLoader(item)
loader.load_item()
#output: {'desc': "\r\t\n        some text            \t\n"}


newloader = MyLoader(item = MyItem(**{'desc':'\n\ta\n'}))
loader.load_item()
#output still: {'desc': "\r\t\n        some text            \t\n"}

我想使用加载器进行一些输出处理,但在这个例子中,空格不会被剥离

2 个答案:

答案 0 :(得分:2)

您可以遍历字段并使用add_value()将其添加到加载程序:

item = MyItem()
item['desc'] = "\r\t\n        some text            \t\n"

loader = MyLoader()

for k, v in item.items():
    loader.add_value(k, v)

loader.load_item()  # {'desc': u'some text'}

答案 1 :(得分:2)

加载程序不会处理您不通过add_方法添加的值。你可以继承ItemLoader

class ReloadableItemLoader(ItemLoader):
    def __init__(self, *args, **kwargs):
        super(ReloadableItemLoader, self).__init__(*args, **kwargs)

        for key, value in self.item.iteritems():
            self._add_value(key, value)