我有兴趣保持对scrapy项目中字段名称的顺序的引用。存储在哪里?
>>> dir(item)
Out[7]:
['_MutableMapping__marker',
'__abstractmethods__',
'__class__',
'__contains__',
'__delattr__',
'__delitem__',
'__dict__',
'__doc__',
'__eq__',
'__format__',
'__getattr__',
'__getattribute__',
'__getitem__',
'__hash__',
'__init__',
'__iter__',
'__len__',
'__metaclass__',
'__module__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__setitem__',
'__sizeof__',
'__slots__',
'__str__',
'__subclasshook__',
'__weakref__',
'_abc_cache',
'_abc_negative_cache',
'_abc_negative_cache_version',
'_abc_registry',
'_class',
'_values',
'clear',
'copy',
'fields',
'get',
'items',
'iteritems',
'iterkeys',
'itervalues',
'keys',
'pop',
'popitem',
'setdefault',
'update',
'values']
我尝试了item.keys(),但是返回了无序的dict
答案 0 :(得分:7)
Item
类有一个dict接口,将值存储在_values
dict中,它不跟踪键顺序(https://github.com/scrapy/scrapy/blob/1.5/scrapy/item.py#L53)。我相信您可以从Item
继承并覆盖__init__
方法,使该容器成为Ordereddict
:
from scrapy import Item
from collections import OrderedDict
class OrderedItem(Item):
def __init__(self, *args, **kwargs):
self._values = OrderedDict()
if args or kwargs: # avoid creating dict for most common case
for k, v in six.iteritems(dict(*args, **kwargs)):
self[k] = v
该项目保留了分配值的顺序:
In [28]: class SomeItem(OrderedItem):
...: a = Field()
...: b = Field()
...: c = Field()
...: d = Field()
...:
...: i = SomeItem()
...: i['b'] = 'bbb'
...: i['a'] = 'aaa'
...: i['d'] = 'ddd'
...: i['c'] = 'ccc'
...: i.items()
...:
Out[28]: [('b', 'bbb'), ('a', 'aaa'), ('d', 'ddd'), ('c', 'ccc')]