Python:检查对象是否具有原子可拾取性

时间:2010-11-16 23:03:13

标签: python pickle

检查对象是否可以原子酸洗的准确方法是什么?当我说“原子酸洗”时,我的意思是不考虑它可能引用的其他物体。例如,此列表:

l = [threading.Lock()]

不是一个pickleable对象,因为它引用了一个不可拾取的Lock。但从原子角度来说,这个列表本身就是可以选择的。

那么如何检查对象是否具有原子可拾取性? (我猜这个检查应该在课上完成,但我不确定。)

我希望它表现得像这样:

>>> is_atomically_pickleable(3)
True
>>> is_atomically_pickleable(3.1)
True
>>> is_atomically_pickleable([1, 2, 3])
True
>>> is_atomically_pickleable(threading.Lock())
False
>>> is_atomically_pickleable(open('whatever', 'r'))
False

5 个答案:

答案 0 :(得分:3)

鉴于您愿意打破封装,我认为这是您可以做的最好的事情:

from pickle import Pickler
import os

class AtomicPickler(Pickler):
  def __init__(self, protocol):
    # You may want to replace this with a fake file object that just
    # discards writes.
    blackhole = open(os.devnull, 'w')

    Pickler.__init__(self, blackhole, protocol)
    self.depth = 0

  def save(self, o):
    self.depth += 1
    if self.depth == 1:
      return Pickler.save(self, o)
    self.depth -= 1
    return

def is_atomically_pickleable(o, protocol=None):
  pickler = AtomicPickler(protocol)
  try:
    pickler.dump(o)
    return True
  except:
    # Hopefully this exception was actually caused by dump(), and not
    # something like a KeyboardInterrupt
    return False

在Python中,你可以判断某些东西是否有效的唯一方法就是尝试它。这与Python一样动态的语言本质。您的问题的难点在于您希望区分最高级别的失败"更深层次的失败。

Pickler.save本质上是Python的pickling逻辑的控件中心,因此上面创建了一个修改后的Pickler,忽略了对其save方法的递归调用。在顶级保存中引发的任何异常都被视为酸洗失败。您可能希望在except语句中添加限定符。 Python中的不合格excepts通常是个坏主意,因为异常不仅用于程序错误,还用于KeyboardInterruptSystemExit等。

这可以为奇数自定义酸洗逻辑的类型提供可以说是假阴性的东西。例如,如果你创建一个类似于自定义列表的类而不是导致Pickler.save被递归调用,它实际上试图以某种方式自己挑选它的元素,然后创建一个包含一个元素的类的实例它的自定义逻辑无法腌制,is_atomically_pickleable将为此实例返回False,即使删除有问题的元素也会导致一个可拾取的对象。

另外,请注意is_atomically_pickleable的协议参数。从理论上讲,当使用不同的协议进行pickle时,对象的行为可能会有所不同(尽管这很奇怪),因此您应该将此匹配作为您提供给dump的协议参数。

答案 1 :(得分:1)

鉴于Python的动态特性,我认为除了启发式或白名单之外,我还没有一种明确定义的方法来做你要求的事。

如果我说:

x = object()

是x“原子可拾取”?如果我说:

怎么办?
x.foo = threading.Lock()

?现在是x“原子可拾取”吗?

如果我创建了一个始终具有锁属性的单独类,该怎么办?如果我从实例中删除了该属性该怎么办?

答案 2 :(得分:1)

我认为persistent_id接口与你试图做的不匹配。它被设计为在您的对象应该引用新程序上的等效对象而不是旧程序的副本时使用。您试图过滤掉每个不能被腌制的对象,这是不同的,为什么要尝试这样做。

我认为这是代码中出现问题的一个标志。你想要挑选引用gui小部件,文件和锁的对象这一事实表明你正在做一些奇怪的事情。您通常持久存在的对象类型不应与该类对象相关或保持对该类对象的引用。

话虽如此,我认为您最好的选择如下:

class MyPickler(Pickler):
    def save(self, obj):
        try:
             Pickler.save(self, obj)
        except PicklingEror:
             Pickle.save( self, FilteredObject(obj) )

这应该适用于python实现,我不保证C实现会发生什么。保存的每个对象都将传递给save方法。当它无法腌制对象时,此方法将引发PicklingError。在这一点上,你可以介入并调用函数,要求它挑选你自己的对象,这应该很好。

修改

根据我的理解,你基本上有一个用户创建的对象字典。有些对象是可选择的,有些则不是。我这样做:

class saveable_dict(dict):
    def __getstate__(self):
        data = {}
        for key, value in self.items():
             try:
                  encoded = cPickle.dumps(value)
             except PicklingError:
                  encoded = cPickle.dumps( Unpickable() )
        return data

    def __setstate__(self, state):
       for key, value in state:
           self[key] = cPickle.loads(value)

然后在想要保存该对象集合时使用该字典。用户应该能够获取任何可选择的对象,但其他所有内容都将作为Unpicklable()对象返回。这与前一种方法的区别在于对象本身是可选择的,但是引用了不可解决的对象。但无论如何,这些物品可能都会被打破。

这种方法还有一个好处,即它完全保留在定义的API中,因此应该在cPickle或pickle中工作。

答案 3 :(得分:0)

我最终编写了自己的解决方案。

Here's the codeHere are the tests。这是GarlicSim的一部分,因此您可以installing garlicsim使用它并执行from garlicsim.general_misc import pickle_tools

如果要在Python 3代码上使用它,请使用Python 3 fork of garlicsim

以下是该模块的摘录(可能已过时):

import re
import cPickle as pickle_module
import pickle # Importing just to get dispatch table, not pickling with it.
import copy_reg
import types

from garlicsim.general_misc import address_tools
from garlicsim.general_misc import misc_tools


def is_atomically_pickleable(thing):
    '''
    Return whether `thing` is an atomically pickleable object.

    "Atomically-pickleable" means that it's pickleable without considering any
    other object that it contains or refers to. For example, a `list` is
    atomically pickleable, even if it contains an unpickleable object, like a
    `threading.Lock()`.

    However, the `threading.Lock()` itself is not atomically pickleable.
    '''
    my_type = misc_tools.get_actual_type(thing)
    return _is_type_atomically_pickleable(my_type, thing)


def _is_type_atomically_pickleable(type_, thing=None):
    '''Return whether `type_` is an atomically pickleable type.'''
    try:
        return _is_type_atomically_pickleable.cache[type_]
    except KeyError:
        pass

    if thing is not None:
        assert isinstance(thing, type_)

    # Sub-function in order to do caching without crowding the main algorithm:
    def get_result():

        # We allow a flag for types to painlessly declare whether they're
        # atomically pickleable:
        if hasattr(type_, '_is_atomically_pickleable'):
            return type_._is_atomically_pickleable

        # Weird special case: `threading.Lock` objects don't have `__class__`.
        # We assume that objects that don't have `__class__` can't be pickled.
        # (With the exception of old-style classes themselves.)
        if not hasattr(thing, '__class__') and \
           (not isinstance(thing, types.ClassType)):
            return False

        if not issubclass(type_, object):
            return True

        def assert_legit_pickling_exception(exception):
            '''Assert that `exception` reports a problem in pickling.'''
            message = exception.args[0]
            segments = [
                "can't pickle",
                'should only be shared between processes through inheritance',
                'cannot be passed between processes or pickled'
            ]
            assert any((segment in message) for segment in segments)
            # todo: turn to warning

        if type_ in pickle.Pickler.dispatch:
            return True

        reduce_function = copy_reg.dispatch_table.get(type_)
        if reduce_function:
            try:
                reduce_result = reduce_function(thing)
            except Exception, exception:
                assert_legit_pickling_exception(exception)
                return False
            else:
                return True

        reduce_function = getattr(type_, '__reduce_ex__', None)
        if reduce_function:
            try:
                reduce_result = reduce_function(thing, 0)
                # (The `0` is the protocol argument.)
            except Exception, exception:
                assert_legit_pickling_exception(exception)
                return False
            else:
                return True

        reduce_function = getattr(type_, '__reduce__', None)
        if reduce_function:
            try:
                reduce_result = reduce_function(thing)
            except Exception, exception:
                assert_legit_pickling_exception(exception)
                return False
            else:
                return True

        return False

    result = get_result()
    _is_type_atomically_pickleable.cache[type_] = result
    return result

_is_type_atomically_pickleable.cache = {}

答案 4 :(得分:0)

dill使用pickles方法进行此类检查。

>>> import threading
>>> l = [threading.Lock()]
>>> 
>>> import dill
>>> dill.pickles(l)
True
>>> 
>>> dill.pickles(threading.Lock())
True
>>> f = open('whatever', 'w') 
>>> f.close()
>>> dill.pickles(open('whatever', 'r'))
True

好吧,dill原子地腌制了你所有的例子,所以让我们尝试别的东西:

>>> l = [iter([1,2,3]), xrange(5)]
>>> dill.pickles(l)
False

好的,这失败了。现在,我们来调查一下:

>>> dill.detect.trace(True)
>>> dill.pickles(l)
T4: <type 'listiterator'>
False
>>> map(dill.pickles, l)
T4: <type 'listiterator'>
Si: xrange(5)
F2: <function _eval_repr at 0x106991cf8>
[False, True]

确定。我们可以看到iter失败,但xrange确实发生了腌制。所以,让我们替换iter

>>> l[0] = xrange(1,4)
>>> dill.pickles(l)
Si: xrange(1, 4)
F2: <function _eval_repr at 0x106991cf8>
Si: xrange(5)
True

现在我们的对象原子化泡菜。