在阅读了pickle文档后,我得到一个印象,即一个类需要实现__reduce__
或__getstate__
才能正确进行pickle。但是,字典的酸洗工作呢?他们没有任何属性:
> dict(a=1).__reduce__()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/home/daniyar/work/Apr24/<ipython-input-30-bc1cbd43305b> in <module>()
----> 1 dict(a=1).__reduce__()
/usr/lib/python2.6/copy_reg.pyc in _reduce_ex(self, proto)
68 else:
69 if base is self.__class__:
---> 70 raise TypeError, "can't pickle %s objects" % base.__name__
71 state = base(self)
72 args = (self.__class__, base, state)
TypeError: can't pickle dict objects
> dict(a=1).__getstate__()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/home/daniyar/work/Apr24/<ipython-input-31-00932fb40067> in <module>()
----> 1 dict(a=1).__getstate__()
AttributeError: 'dict' object has no attribute '__getstate__'
此外,如何从dict派生的类被腌制?
答案 0 :(得分:8)
The pickle module handles a number of types "natively"。 本身不处理的类型需要实现"pickle protocol"。 Dicts和简单的子类是本地处理的。
答案 1 :(得分:4)
__reduce__
和__getstate__
方法是酸洗方法的下限,当您需要从解释器处理一些特殊处理时,可以在自定义类上实现。
例如,如果扩展类的实例位于您试图挑选的字典中,那么如果您的类没有实现那些说明如何腌制它的方法,则会导致整个字典无法使用。
解释器知道如何挑选内置函数,并且要使用pickle.dump
或pickle.dumps
方法来挑选字典,而不是通过调用__reduce__
或__getstate__
。
答案 2 :(得分:3)
酸洗不需要__reduce__
或__getstate__
。这些是你可以用来控制酸洗的方法,但是pickle可以在没有它们的情况下使用内置类型。
答案 3 :(得分:1)
我从here获得的有用答案
以下是__getstate__
和__setstate__
内的内容。即使你不能以某种方式立即使用它,但你可以从头开始,如下所示:
def __getstate__(self):
result = self.__dict__.copy()
return result
def __setstate__(self, dict):
self.__dict__ = dict
答案 4 :(得分:1)
所有好的答案,但他们忽略了这个问题:
此外,如何从dict派生的类被腌制?
与任何其他类一样,它们通过引用进行腌制。 如果你看一下pickle,你可以看到python正在做什么。
>>> class MyDict(dict):
... def __repr__(self):
... return "MyDict({})".format(dict(i for i in self.items()))
...
>>> m = MyDict(a=1,b=2)
>>> m
MyDict({'a': 1, 'b': 2})
>>> import pickle
>>> # reconstructor called on class MyDict that lives in __main__
>>> # and contains a __builtin__ dict with contents ('a' and 'b')
>>> pickle.dumps(m)
"ccopy_reg\n_reconstructor\np0\n(c__main__\nMyDict\np1\nc__builtin__\ndict\np2\n(dp3\nS'a'\np4\nI1\nsS'b'\np5\nI2\nstp6\nRp7\n."
>>> m.clear()
>>> # removing the contents, to show how that affects the pickle
>>> pickle.dumps(m)
'ccopy_reg\n_reconstructor\np0\n(c__main__\nMyDict\np1\nc__builtin__\ndict\np2\n(dp3\ntp4\nRp5\n.'
>>> # now, just looking at the class itself, you can see it's by reference
>>> pickle.dumps(MyDict)
'c__main__\nMyDict\np0\n.'
或者,我们可以这样做,但检查拆解的泡菜。您可以准确地看到存储的指令。
>>> pickletools.dis(pickle.dumps(m))
0: c GLOBAL 'copy_reg _reconstructor'
25: p PUT 0
28: ( MARK
29: c GLOBAL '__main__ MyDict'
46: p PUT 1
49: c GLOBAL '__builtin__ dict'
67: p PUT 2
70: ( MARK
71: d DICT (MARK at 70)
72: p PUT 3
75: t TUPLE (MARK at 28)
76: p PUT 4
79: R REDUCE
80: p PUT 5
83: . STOP
highest protocol among opcodes = 0
>>> pickletools.dis(pickle.dumps(MyDict))
0: c GLOBAL '__main__ MyDict'
17: p PUT 0
20: . STOP
highest protocol among opcodes = 0
该类绝对是通过引用存储的,甚至认为它来自dict
而不是object
。引用是名称,这意味着一旦__main__
会话关闭,类定义将丢失,并且依赖于MyClass
的pickle将不会加载。
现在,让我们看一下dict
。一个dict
泡菜首先依靠python知道如何序列化基本对象,如dict
(如其他答案中所述),然后它将序列化内容。你可以看到它里面有两个strings
,python本身也知道如何序列化。
这意味着如果你在dict中有不可序列化的对象,它就会失败。
>>> d['c'] = MyDict.__repr__
>>> d
{'a': 1, 'c': <unbound method MyDict.__repr__>, 'b': 2}
>>> pickle.dumps(d)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1374, in dumps
Pickler(file, protocol).dump(obj)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 649, in save_dict
self._batch_setitems(obj.iteritems())
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 663, in _batch_setitems
save(v)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 306, in save
rv = reduce(self.proto)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle instance method objects
如果您使用更好的序列化程序,我们可以做得更好,顺便说一句。使用dill
代替pickle
可以将大多数对象序列化。正如你在下面看到的那样,dict的腌制要复杂得多。
>>> import dill
>>> dill.dumps(d)
'\x80\x02}q\x00(U\x01aq\x01K\x01U\x01cq\x02cdill.dill\n_load_type\nq\x03U\nMethodTypeq\x04\x85q\x05Rq\x06cdill.dill\n_create_function\nq\x07(cdill.dill\n_unmarshal\nq\x08T$\x01\x00\x00c\x01\x00\x00\x00\x01\x00\x00\x00\x04\x00\x00\x00C\x00\x00\x00s#\x00\x00\x00d\x01\x00j\x00\x00t\x01\x00d\x02\x00\x84\x00\x00|\x00\x00j\x02\x00\x83\x00\x00D\x83\x01\x00\x83\x01\x00\x83\x01\x00S(\x03\x00\x00\x00Ns\n\x00\x00\x00MyDict({})c\x01\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00s\x00\x00\x00s\x15\x00\x00\x00|\x00\x00]\x0b\x00}\x01\x00|\x01\x00V\x01q\x03\x00d\x00\x00S(\x01\x00\x00\x00N(\x00\x00\x00\x00(\x02\x00\x00\x00t\x02\x00\x00\x00.0t\x01\x00\x00\x00i(\x00\x00\x00\x00(\x00\x00\x00\x00s\x07\x00\x00\x00<stdin>s\t\x00\x00\x00<genexpr>\x03\x00\x00\x00s\x02\x00\x00\x00\x06\x00(\x03\x00\x00\x00t\x06\x00\x00\x00formatt\x04\x00\x00\x00dictt\x05\x00\x00\x00items(\x01\x00\x00\x00t\x04\x00\x00\x00self(\x00\x00\x00\x00(\x00\x00\x00\x00s\x07\x00\x00\x00<stdin>t\x08\x00\x00\x00__repr__\x02\x00\x00\x00s\x02\x00\x00\x00\x00\x01q\t\x85q\nRq\x0bc__builtin__\n__main__\nU\x08__repr__q\x0cNN}q\rtq\x0eRq\x0fNcdill.dill\n_create_type\nq\x10(h\x03U\x08TypeTypeq\x11\x85q\x12Rq\x13U\x06MyDictq\x14h\x03U\x08DictTypeq\x15\x85q\x16Rq\x17\x85q\x18}q\x19(U\n__module__q\x1aU\x08__main__q\x1bh\x0ch\x0fU\x07__doc__q\x1cNutq\x1dRq\x1e\x87q\x1fRq U\x01bq!K\x02u.'
>>> pickletools.dis(dill.dumps(d))
0: \x80 PROTO 2
2: } EMPTY_DICT
3: q BINPUT 0
5: ( MARK
6: U SHORT_BINSTRING 'a'
9: q BINPUT 1
11: K BININT1 1
13: U SHORT_BINSTRING 'c'
16: q BINPUT 2
18: c GLOBAL 'dill.dill _load_type'
40: q BINPUT 3
42: U SHORT_BINSTRING 'MethodType'
54: q BINPUT 4
56: \x85 TUPLE1
57: q BINPUT 5
59: R REDUCE
60: q BINPUT 6
62: c GLOBAL 'dill.dill _create_function'
90: q BINPUT 7
92: ( MARK
93: c GLOBAL 'dill.dill _unmarshal'
115: q BINPUT 8
117: T BINSTRING 'c\x01\x00\x00\x00\x01\x00\x00\x00\x04\x00\x00\x00C\x00\x00\x00s#\x00\x00\x00d\x01\x00j\x00\x00t\x01\x00d\x02\x00\x84\x00\x00|\x00\x00j\x02\x00\x83\x00\x00D\x83\x01\x00\x83\x01\x00\x83\x01\x00S(\x03\x00\x00\x00Ns\n\x00\x00\x00MyDict({})c\x01\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00s\x00\x00\x00s\x15\x00\x00\x00|\x00\x00]\x0b\x00}\x01\x00|\x01\x00V\x01q\x03\x00d\x00\x00S(\x01\x00\x00\x00N(\x00\x00\x00\x00(\x02\x00\x00\x00t\x02\x00\x00\x00.0t\x01\x00\x00\x00i(\x00\x00\x00\x00(\x00\x00\x00\x00s\x07\x00\x00\x00<stdin>s\t\x00\x00\x00<genexpr>\x03\x00\x00\x00s\x02\x00\x00\x00\x06\x00(\x03\x00\x00\x00t\x06\x00\x00\x00formatt\x04\x00\x00\x00dictt\x05\x00\x00\x00items(\x01\x00\x00\x00t\x04\x00\x00\x00self(\x00\x00\x00\x00(\x00\x00\x00\x00s\x07\x00\x00\x00<stdin>t\x08\x00\x00\x00__repr__\x02\x00\x00\x00s\x02\x00\x00\x00\x00\x01'
414: q BINPUT 9
416: \x85 TUPLE1
417: q BINPUT 10
419: R REDUCE
420: q BINPUT 11
422: c GLOBAL '__builtin__ __main__'
444: U SHORT_BINSTRING '__repr__'
454: q BINPUT 12
456: N NONE
457: N NONE
458: } EMPTY_DICT
459: q BINPUT 13
461: t TUPLE (MARK at 92)
462: q BINPUT 14
464: R REDUCE
465: q BINPUT 15
467: N NONE
468: c GLOBAL 'dill.dill _create_type'
492: q BINPUT 16
494: ( MARK
495: h BINGET 3
497: U SHORT_BINSTRING 'TypeType'
507: q BINPUT 17
509: \x85 TUPLE1
510: q BINPUT 18
512: R REDUCE
513: q BINPUT 19
515: U SHORT_BINSTRING 'MyDict'
523: q BINPUT 20
525: h BINGET 3
527: U SHORT_BINSTRING 'DictType'
537: q BINPUT 21
539: \x85 TUPLE1
540: q BINPUT 22
542: R REDUCE
543: q BINPUT 23
545: \x85 TUPLE1
546: q BINPUT 24
548: } EMPTY_DICT
549: q BINPUT 25
551: ( MARK
552: U SHORT_BINSTRING '__module__'
564: q BINPUT 26
566: U SHORT_BINSTRING '__main__'
576: q BINPUT 27
578: h BINGET 12
580: h BINGET 15
582: U SHORT_BINSTRING '__doc__'
591: q BINPUT 28
593: N NONE
594: u SETITEMS (MARK at 551)
595: t TUPLE (MARK at 494)
596: q BINPUT 29
598: R REDUCE
599: q BINPUT 30
601: \x87 TUPLE3
602: q BINPUT 31
604: R REDUCE
605: q BINPUT 32
607: U SHORT_BINSTRING 'b'
610: q BINPUT 33
612: K BININT1 2
614: u SETITEMS (MARK at 5)
615: . STOP
highest protocol among opcodes = 2
Dill
序列化了类方法,因为已经注册到dill
的其他函数知道如何挑选和取消更广泛的对象 - 您可以在反汇编代码(以dill.dill
开头)。它是一个更大的泡菜,但它通常适用于你填入dict
的任何内容。
>>> from numpy import *
>>> everything = dill.dumps(globals())
对于派生自dict
的类,您不必担心在类方法中有不可解决的对象 - 但是,自定义{{1}的内容仍然使用类实例序列化,因此您必须担心类中包含不可序列化的对象。
dict
Python 2.7.9 (default, Dec 11 2014, 01:21:43)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import pickle
>>> class MyDict(dict):
... def __repr__(self):
... return "MyDict({})".format(dict(i for i in self.items()))
...
>>> m = MyDict(a = lambda x:x)
>>> m
MyDict({'a': <function <lambda> at 0x10892b230>})
>>> pickle.dumps(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
>>> pickle.dumps(m)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1374, in dumps
Pickler(file, protocol).dump(obj)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 401, in save_reduce
save(args)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 562, in save_tuple
save(element)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 649, in save_dict
self._batch_setitems(obj.iteritems())
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 663, in _batch_setitems
save(v)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 748, in save_global
(obj, module, name))
pickle.PicklingError: Can't pickle <function <lambda> at 0x10892b230>: it's not found as __main__.<lambda>
无法序列化,因为它没有lambda
可以引用的名称。回到pickle
,我们看到这是有效的。
dill