Question

我想在两个Python程序之间传递对象状态（一个是我自己的独立运行代码，一个是Pyramid视图）和不同的命名空间。有些相关的问题是here或here，但我无法完全遵循这些问题。

我自己的代码定义了一个有点复杂结构的全局类（即__main__命名空间）：

# An instance of this is a colorful mess of nested lists and sets and dicts.
class MyClass :
    def __init__(self) :
        data = set()
        more = dict()
        ... 

    def do_sth(self) :
        ...

在某些时候，我挑选了这个类的一个实例：

c = MyClass()
# Fill c with data.

# Pickle and write the MyClass instance within the __main__ namespace.
with open("my_c.pik", "wb") as f :
    pickle.dump(c, f, -1)

A hexdump -C my_c.pik表明前几个字节包含__main__.MyClass，我假设该类确实在全局命名空间中定义，并且这在某种程度上是读取pickle的必要条件。现在我想从Pyramid视图中加载这个pickled MyClass实例，我假设它是一个不同的命名空间：

# In Pyramid (different namespace) read the pickled MyClass instance.
with open("my_c.pik", "rb") as f :
    c = pickle.load(f)

但是这会导致以下错误：

File ".../views.py", line 60, in view_handler_bla
  c = pickle.load(f)
AttributeError: 'module' object has no attribute 'MyClass'

在我看来，在视图代码执行的任何命名空间中都缺少MyClass定义？我曾希望（假设）酸洗是一个有点不透明的过程，它允许我在我选择的任何地方读取一大块数据。（关于Python的类名和命名空间的更多信息是here。）

我该如何正确处理？（理想情况下，无需导入内容...）我可以以某种方式找到当前的命名空间并注入MyClass（如this答案似乎建议）？

糟糕的解决方案

在我看来，如果我避免定义和使用MyClass，而是回归到普通的内置数据类型，这不会是一个问题。事实上，我可以将MyClass对象“序列化”为一系列调用，以挑选MyClass实例的各个元素：

# 'Manual' serialization of c works, because all elements are built-in types.
pickle.dump(c.data, f, -1)
pickle.dump(c.more, f, -1)
...

这会破坏将数据包装到类中的目的。

注意

Pickling只关注类的状态， not 在类的范围内定义的任何函数（例如上例中的do_sth()）。这意味着将MyClass实例加载到不同的命名空间而没有正确的类定义只加载实例数据;调用do_sth()之类的缺失函数会导致AttributeError。

Answer 1

使用dill代替pickle，因为dill默认情况下通过序列化类定义而不是通过引用进行pickle。

>>> import dill
>>> class MyClass:
...   def __init__(self): 
...     self.data = set()
...     self.more = dict()
...   def do_stuff(self):
...     return sorted(self.more)
... 
>>> c = MyClass()
>>> c.data.add(1)
>>> c.data.add(2)
>>> c.data.add(3)
>>> c.data
set([1, 2, 3])
>>> c.more['1'] = 1
>>> c.more['2'] = 2
>>> c.more['3'] = lambda x:x
>>> def more_stuff(self, x):  
...   return x+1
... 
>>> c.more_stuff = more_stuff
>>> 
>>> with open('my_c.pik', "wb") as f:
...   dill.dump(c, f)
... 
>>>

关闭会话，然后在新会话中重新启动...

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('my_c.pik', "rb") as f:
...   c = dill.load(f)
... 
>>> c.data
set([1, 2, 3])
>>> c.more
{'1': 1, '3': <function <lambda> at 0x10473ec80>, '2': 2}
>>> c.do_stuff()
['1', '2', '3']
>>> c.more_stuff(5)
6

在此处获取dill：https://github.com/uqfoundation/dill

Answer 2

解决方案1 

在pickle.load上，模块__main__需要有一个名为MyClass的函数或类。这不需要是原始源代码的原始类。你可以在其中加入其他方法。它应该工作。

class MyClass(object):
    pass

with open("my_c.pik", "rb") as f :
    c = pickle.load(f)

解决方案2

使用用于注册构造函数和pickle函数的copyreg module来pickle特定对象。这是模块给出的复数的例子：

def pickle_complex(c):
    return complex, (c.real, c.imag)

copyreg.pickle(complex, pickle_complex, complex)

解决方案3

覆盖Pickler和Unpickler的persistent_id method。 pickler.persistent_id(obj)将返回一个可由unpickler.persistent_id(id)解析到该对象的标识符。

Answer 3

最简单的解决方案是使用 cloudpickle ：

https://github.com/cloudpipe/cloudpickle

这使我能够轻松地将腌制的类文件发送到另一台机器，并再次使用 cloudpickle 对其进行解腌。

使用Python pickle存储对象，并将其加载到不同的命名空间中

3 个答案: