dict对象如何被腌制?

时间:2012-04-28 15:17:33

标签: python pickle

在阅读了pickle文档后,我得到一个印象,即一个类需要实现__reduce____getstate__才能正确进行pickle。但是,字典的酸洗工作呢?他们没有任何属性:

> dict(a=1).__reduce__()

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/home/daniyar/work/Apr24/<ipython-input-30-bc1cbd43305b> in <module>()
----> 1 dict(a=1).__reduce__()

/usr/lib/python2.6/copy_reg.pyc in _reduce_ex(self, proto)
     68     else:
     69         if base is self.__class__:
---> 70             raise TypeError, "can't pickle %s objects" % base.__name__
     71         state = base(self)
     72     args = (self.__class__, base, state)

TypeError: can't pickle dict objects



> dict(a=1).__getstate__()

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/home/daniyar/work/Apr24/<ipython-input-31-00932fb40067> in <module>()
----> 1 dict(a=1).__getstate__()

AttributeError: 'dict' object has no attribute '__getstate__'

此外,如何从dict派生的类被腌制?

5 个答案:

答案 0 :(得分:8)

The pickle module handles a number of types "natively"本身不处理的类型需要实现"pickle protocol"。 Dicts和简单的子类是本地处理的。

答案 1 :(得分:4)

__reduce____getstate__方法是酸洗方法的下限,当您需要从解释器处理一些特殊处理时,可以在自定义类上实现。

例如,如果扩展类的实例位于您试图挑选的字典中,那么如果您的类没有实现那些说明如何腌制它的方法,则会导致整个字典无法使用。

解释器知道如何挑选内置函数,并且要使用pickle.dumppickle.dumps方法来挑选字典,而不是通过调用__reduce____getstate__

答案 2 :(得分:3)

酸洗不需要__reduce____getstate__。这些是你可以用来控制酸洗的方法,但是pickle可以在没有它们的情况下使用内置类型。

答案 3 :(得分:1)

我从here获得的有用答案

以下是__getstate____setstate__内的内容。即使你不能以某种方式立即使用它,但你可以从头开始,如下所示:

def __getstate__(self):
    result = self.__dict__.copy()
    return result

def __setstate__(self, dict):
    self.__dict__ = dict

答案 4 :(得分:1)

所有好的答案,但他们忽略了这个问题:

  

此外,如何从dict派生的类被腌制?

与任何其他类一样,它们通过引用进行腌制。 如果你看一下pickle,你可以看到python正在做什么。

>>> class MyDict(dict):
...   def __repr__(self):
...     return "MyDict({})".format(dict(i for i in self.items()))
... 
>>> m = MyDict(a=1,b=2)
>>> m
MyDict({'a': 1, 'b': 2})
>>> import pickle
>>> # reconstructor called on class MyDict that lives in __main__
>>> # and contains a __builtin__ dict with contents ('a' and 'b')
>>> pickle.dumps(m)
"ccopy_reg\n_reconstructor\np0\n(c__main__\nMyDict\np1\nc__builtin__\ndict\np2\n(dp3\nS'a'\np4\nI1\nsS'b'\np5\nI2\nstp6\nRp7\n."
>>> m.clear()
>>> # removing the contents, to show how that affects the pickle
>>> pickle.dumps(m)
'ccopy_reg\n_reconstructor\np0\n(c__main__\nMyDict\np1\nc__builtin__\ndict\np2\n(dp3\ntp4\nRp5\n.'
>>> # now, just looking at the class itself, you can see it's by reference
>>> pickle.dumps(MyDict)
'c__main__\nMyDict\np0\n.'

或者,我们可以这样做,但检查拆解的泡菜。您可以准确地看到存储的指令。

>>> pickletools.dis(pickle.dumps(m))
    0: c    GLOBAL     'copy_reg _reconstructor'
   25: p    PUT        0
   28: (    MARK
   29: c        GLOBAL     '__main__ MyDict'
   46: p        PUT        1
   49: c        GLOBAL     '__builtin__ dict'
   67: p        PUT        2
   70: (        MARK
   71: d            DICT       (MARK at 70)
   72: p        PUT        3
   75: t        TUPLE      (MARK at 28)
   76: p    PUT        4
   79: R    REDUCE
   80: p    PUT        5
   83: .    STOP
highest protocol among opcodes = 0
>>> pickletools.dis(pickle.dumps(MyDict))
    0: c    GLOBAL     '__main__ MyDict'
   17: p    PUT        0
   20: .    STOP
highest protocol among opcodes = 0

该类绝对是通过引用存储的,甚至认为它来自dict而不是object。引用是名称,这意味着一旦__main__会话关闭,类定义将丢失,并且依赖于MyClass的pickle将不会加载。

现在,让我们看一下dict。一个dict泡菜首先依靠python知道如何序列化基本对象,如dict(如其他答案中所述),然后它将序列化内容。你可以看到它里面有两个strings,python本身也知道如何序列化。

这意味着如果你在dict中有不可序列化的对象,它就会失败。

>>> d['c'] = MyDict.__repr__
>>> d
{'a': 1, 'c': <unbound method MyDict.__repr__>, 'b': 2}
>>> pickle.dumps(d)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1374, in dumps
    Pickler(file, protocol).dump(obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump
    self.save(obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 649, in save_dict
    self._batch_setitems(obj.iteritems())
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 663, in _batch_setitems
    save(v)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 306, in save
    rv = reduce(self.proto)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
    raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle instance method objects

如果您使用更好的序列化程序,我们可以做得更好,顺便说一句。使用dill代替pickle可以将大多数对象序列化。正如你在下面看到的那样,dict的腌制要复杂得多。

>>> import dill
>>> dill.dumps(d)
'\x80\x02}q\x00(U\x01aq\x01K\x01U\x01cq\x02cdill.dill\n_load_type\nq\x03U\nMethodTypeq\x04\x85q\x05Rq\x06cdill.dill\n_create_function\nq\x07(cdill.dill\n_unmarshal\nq\x08T$\x01\x00\x00c\x01\x00\x00\x00\x01\x00\x00\x00\x04\x00\x00\x00C\x00\x00\x00s#\x00\x00\x00d\x01\x00j\x00\x00t\x01\x00d\x02\x00\x84\x00\x00|\x00\x00j\x02\x00\x83\x00\x00D\x83\x01\x00\x83\x01\x00\x83\x01\x00S(\x03\x00\x00\x00Ns\n\x00\x00\x00MyDict({})c\x01\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00s\x00\x00\x00s\x15\x00\x00\x00|\x00\x00]\x0b\x00}\x01\x00|\x01\x00V\x01q\x03\x00d\x00\x00S(\x01\x00\x00\x00N(\x00\x00\x00\x00(\x02\x00\x00\x00t\x02\x00\x00\x00.0t\x01\x00\x00\x00i(\x00\x00\x00\x00(\x00\x00\x00\x00s\x07\x00\x00\x00<stdin>s\t\x00\x00\x00<genexpr>\x03\x00\x00\x00s\x02\x00\x00\x00\x06\x00(\x03\x00\x00\x00t\x06\x00\x00\x00formatt\x04\x00\x00\x00dictt\x05\x00\x00\x00items(\x01\x00\x00\x00t\x04\x00\x00\x00self(\x00\x00\x00\x00(\x00\x00\x00\x00s\x07\x00\x00\x00<stdin>t\x08\x00\x00\x00__repr__\x02\x00\x00\x00s\x02\x00\x00\x00\x00\x01q\t\x85q\nRq\x0bc__builtin__\n__main__\nU\x08__repr__q\x0cNN}q\rtq\x0eRq\x0fNcdill.dill\n_create_type\nq\x10(h\x03U\x08TypeTypeq\x11\x85q\x12Rq\x13U\x06MyDictq\x14h\x03U\x08DictTypeq\x15\x85q\x16Rq\x17\x85q\x18}q\x19(U\n__module__q\x1aU\x08__main__q\x1bh\x0ch\x0fU\x07__doc__q\x1cNutq\x1dRq\x1e\x87q\x1fRq U\x01bq!K\x02u.'
>>> pickletools.dis(dill.dumps(d))
    0: \x80 PROTO      2
    2: }    EMPTY_DICT
    3: q    BINPUT     0
    5: (    MARK
    6: U        SHORT_BINSTRING 'a'
    9: q        BINPUT     1
   11: K        BININT1    1
   13: U        SHORT_BINSTRING 'c'
   16: q        BINPUT     2
   18: c        GLOBAL     'dill.dill _load_type'
   40: q        BINPUT     3
   42: U        SHORT_BINSTRING 'MethodType'
   54: q        BINPUT     4
   56: \x85     TUPLE1
   57: q        BINPUT     5
   59: R        REDUCE
   60: q        BINPUT     6
   62: c        GLOBAL     'dill.dill _create_function'
   90: q        BINPUT     7
   92: (        MARK
   93: c            GLOBAL     'dill.dill _unmarshal'
  115: q            BINPUT     8
  117: T            BINSTRING  'c\x01\x00\x00\x00\x01\x00\x00\x00\x04\x00\x00\x00C\x00\x00\x00s#\x00\x00\x00d\x01\x00j\x00\x00t\x01\x00d\x02\x00\x84\x00\x00|\x00\x00j\x02\x00\x83\x00\x00D\x83\x01\x00\x83\x01\x00\x83\x01\x00S(\x03\x00\x00\x00Ns\n\x00\x00\x00MyDict({})c\x01\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00s\x00\x00\x00s\x15\x00\x00\x00|\x00\x00]\x0b\x00}\x01\x00|\x01\x00V\x01q\x03\x00d\x00\x00S(\x01\x00\x00\x00N(\x00\x00\x00\x00(\x02\x00\x00\x00t\x02\x00\x00\x00.0t\x01\x00\x00\x00i(\x00\x00\x00\x00(\x00\x00\x00\x00s\x07\x00\x00\x00<stdin>s\t\x00\x00\x00<genexpr>\x03\x00\x00\x00s\x02\x00\x00\x00\x06\x00(\x03\x00\x00\x00t\x06\x00\x00\x00formatt\x04\x00\x00\x00dictt\x05\x00\x00\x00items(\x01\x00\x00\x00t\x04\x00\x00\x00self(\x00\x00\x00\x00(\x00\x00\x00\x00s\x07\x00\x00\x00<stdin>t\x08\x00\x00\x00__repr__\x02\x00\x00\x00s\x02\x00\x00\x00\x00\x01'
  414: q            BINPUT     9
  416: \x85         TUPLE1
  417: q            BINPUT     10
  419: R            REDUCE
  420: q            BINPUT     11
  422: c            GLOBAL     '__builtin__ __main__'
  444: U            SHORT_BINSTRING '__repr__'
  454: q            BINPUT     12
  456: N            NONE
  457: N            NONE
  458: }            EMPTY_DICT
  459: q            BINPUT     13
  461: t            TUPLE      (MARK at 92)
  462: q        BINPUT     14
  464: R        REDUCE
  465: q        BINPUT     15
  467: N        NONE
  468: c        GLOBAL     'dill.dill _create_type'
  492: q        BINPUT     16
  494: (        MARK
  495: h            BINGET     3
  497: U            SHORT_BINSTRING 'TypeType'
  507: q            BINPUT     17
  509: \x85         TUPLE1
  510: q            BINPUT     18
  512: R            REDUCE
  513: q            BINPUT     19
  515: U            SHORT_BINSTRING 'MyDict'
  523: q            BINPUT     20
  525: h            BINGET     3
  527: U            SHORT_BINSTRING 'DictType'
  537: q            BINPUT     21
  539: \x85         TUPLE1
  540: q            BINPUT     22
  542: R            REDUCE
  543: q            BINPUT     23
  545: \x85         TUPLE1
  546: q            BINPUT     24
  548: }            EMPTY_DICT
  549: q            BINPUT     25
  551: (            MARK
  552: U                SHORT_BINSTRING '__module__'
  564: q                BINPUT     26
  566: U                SHORT_BINSTRING '__main__'
  576: q                BINPUT     27
  578: h                BINGET     12
  580: h                BINGET     15
  582: U                SHORT_BINSTRING '__doc__'
  591: q                BINPUT     28
  593: N                NONE
  594: u                SETITEMS   (MARK at 551)
  595: t            TUPLE      (MARK at 494)
  596: q        BINPUT     29
  598: R        REDUCE
  599: q        BINPUT     30
  601: \x87     TUPLE3
  602: q        BINPUT     31
  604: R        REDUCE
  605: q        BINPUT     32
  607: U        SHORT_BINSTRING 'b'
  610: q        BINPUT     33
  612: K        BININT1    2
  614: u        SETITEMS   (MARK at 5)
  615: .    STOP
highest protocol among opcodes = 2

Dill序列化了类方法,因为已经注册到dill的其他函数知道如何挑选和取消更广泛的对象 - 您可以在反汇编代码(以dill.dill开头)。它是一个更大的泡菜,但它通常适用于你填入dict的任何内容。

>>> from numpy import *
>>> everything = dill.dumps(globals())

对于派生自dict的类,您不必担心在类方法中有不可解决的对象 - 但是,自定义{{1}的内容仍然使用类实例序列化,因此您必须担心类中包含不可序列化的对象。

dict

Python 2.7.9 (default, Dec 11 2014, 01:21:43) [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> >>> import pickle >>> class MyDict(dict): ... def __repr__(self): ... return "MyDict({})".format(dict(i for i in self.items())) ... >>> m = MyDict(a = lambda x:x) >>> m MyDict({'a': <function <lambda> at 0x10892b230>}) >>> pickle.dumps(a) Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'a' is not defined >>> pickle.dumps(m) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1374, in dumps Pickler(file, protocol).dump(obj) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump self.save(obj) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 331, in save self.save_reduce(obj=obj, *rv) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 401, in save_reduce save(args) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 562, in save_tuple save(element) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 649, in save_dict self._batch_setitems(obj.iteritems()) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 663, in _batch_setitems save(v) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 748, in save_global (obj, module, name)) pickle.PicklingError: Can't pickle <function <lambda> at 0x10892b230>: it's not found as __main__.<lambda> 无法序列化,因为它没有lambda可以引用的名称。回到pickle,我们看到这是有效的。

dill