如何在Scrapy项目中获取字段顺序

时间:2016-12-23 01:12:03

标签: python scrapy

我有兴趣保持对scrapy项目中字段名称的顺序的引用。存储在哪里?

>>> dir(item)
Out[7]: 
['_MutableMapping__marker',
 '__abstractmethods__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__doc__',
 '__eq__',
 '__format__',
 '__getattr__',
 '__getattribute__',
 '__getitem__',
 '__hash__',
 '__init__',
 '__iter__',
 '__len__',
 '__metaclass__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__slots__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_abc_cache',
 '_abc_negative_cache',
 '_abc_negative_cache_version',
 '_abc_registry',
 '_class',
 '_values',
 'clear',
 'copy',
 'fields',
 'get',
 'items',
 'iteritems',
 'iterkeys',
 'itervalues',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values']

我尝试了item.keys(),但是返回了无序的dict

1 个答案:

答案 0 :(得分:7)

Item类有一个dict接口,将值存储在_values dict中,它不跟踪键顺序(https://github.com/scrapy/scrapy/blob/1.5/scrapy/item.py#L53)。我相信您可以从Item继承并覆盖__init__方法,使该容器成为Ordereddict

from scrapy import Item
from collections import OrderedDict

class OrderedItem(Item):
    def __init__(self, *args, **kwargs):
        self._values = OrderedDict()
        if args or kwargs:  # avoid creating dict for most common case
            for k, v in six.iteritems(dict(*args, **kwargs)):
                self[k] = v

该项目保留了分配值的顺序:

In [28]: class SomeItem(OrderedItem):
    ...:     a = Field()
    ...:     b = Field()
    ...:     c = Field()
    ...:     d = Field()
    ...: 
    ...: i = SomeItem()
    ...: i['b'] = 'bbb'
    ...: i['a'] = 'aaa'
    ...: i['d'] = 'ddd'
    ...: i['c'] = 'ccc'
    ...: i.items()
    ...: 
Out[28]: [('b', 'bbb'), ('a', 'aaa'), ('d', 'ddd'), ('c', 'ccc')]