酸洗-脱酸后重组物

时间:2021-01-23 12:05:25

标签: python python-3.x pickle persistent

我有一种情况,一方(Alice)有一个复杂的自定义对象,其属性很复杂,可能涉及循环引用。然后,Alice 通过酸洗和通过(加密的)套接字发送这个对象到两个不同的方,Bob 和 Claire。然后他们每个人都修改了对象的一个​​属性,但他们更改的内容包括对他们从 Alice 那里收到的对象的复杂引用。 Bob 和 Claire 然后自己腌制他们自己修改过的对象,并将其发送回 Alice。

问题是,Alice 如何将 Bob 和 Claire 所做的更改结合起来?因为在pickling/unpickling 时对象持久性丢失,将Bob 或Claire 创建的属性复制到原始对象上的幼稚想法是行不通的。我知道persistent_id() 和persistent_load() 在pickling 中是如何工作的,但我非常希望避免必须为Alice 创建的对象中的每个属性手动编写规则。部分是因为它有一大堆嵌套和循环引用的对象(大约有 10,000 多行),部分是因为我想要灵活地修改其余代码,而不必每次都改变我的 pickle/unpickle 方式(以及正确测试)。

能做到吗?还是我必须吞下苦涩的药丸并“手动”处理酸洗?

这是一个最小的具体示例。显然,在这里可以轻松删除循环引用,或者 Bob 和 Claire 可以将它们的值发送给 Alice,但在我的实际情况中并非如此。

import pickle


class Shared:
    pass


class Bob:
    pass


class Claire:
    pass


class Alice:

    def __init__(self):
        self.shared = Shared()
        self.bob = Bob()
        self.claire = Claire()

    def add_some_data(self, x, y):
        self.shared.bob = self.bob
        self.shared.claire = self.claire
        self.shared.x = x
        self.shared.y = y

    def bob_adds_data(self, extra):
        self.bob.value = self.shared.x + self.shared.y + extra

    def claire_adds_data(self, extra):
        self.claire.value = self.shared.x * self.shared.y * extra


# Done on Alice's side
original = Alice()
original.add_some_data(2, 3)
outgoing = pickle.dumps(original)


# Done on Bob's side
bobs_copy = pickle.loads(outgoing)
bobs_copy.bob_adds_data(4)
bobs_reply = pickle.dumps(bobs_copy)


# Done on Claires's side
claires_copy = pickle.loads(outgoing)
claires_copy.claire_adds_data(5)
claires_reply = pickle.dumps(claires_copy)


# Done on Alice's side
from_bob = pickle.loads(bobs_reply)
from_claire = pickle.loads(claires_reply)
original.bob = from_bob.bob
original.claire = from_claire.claire
# If the circularly references were maintained, these two lines would be equal
# instead, the attributes on the bottom line do not exist because the reference is broken
print(original.bob.value, original.claire.value)
print(original.shared.bob.value, original.shared.claire.value)

1 个答案:

答案 0 :(得分:0)

部分解决方案

我有一个部分解决方案,它对问题案例有一些限制。

限制

限制是 Alice 的对象在已知位置只有一个对 Bob 和 Claire 的引用。然而,后两者可以对它们自己和 Alice 有任意复杂的引用,包括循环、嵌套和递归结构。另一个要求是 Bob 没有任何对 Claire 的引用,反之亦然:如果我们要求这两个对象以任何顺序独立更新,这很自然。

换句话说,爱丽丝从鲍勃那里收到了一些放在一个整洁的地方的东西。困难在于使 Bob 中包含的引用与 Alice 包含的正确对象匹配,但 Alice 本身中的任何其他内容都不需要更改。这是我需要的用例,我不清楚如果 Bob 和 Claire 可以对 Alice 进行任意更改,那么更一般的情况是否可行。

想法

这是通过一个基类来工作的,该基类创建一个在对象的生命周期内不会改变的持久 id,由酸洗/取消酸洗维护,并且是唯一的。在这种情况下要维护其引用的任何对象都必须从此类继承。当 Bob 将他的更改发送给 Alice 时,他使用他从 Alice 那里收到的所有对象及其持久 ID 的字典进行pickle,以便对预先存在的对象的所有引用都由持久 ID 编码。另一方面,爱丽丝也这样做。她使用持久 id 字典对 Bob 发送给她的内容进行解压缩,以反对她之前发送给 Bob 的所有内容。因此,虽然 Alice 和 Bob 对所有事物都有不同的实例,但某些对象的持久 ID 是相同的,因此在不同方之间进行酸洗时可以“交换”它们。

这可以很容易地与现有代码一起使用。它只包括为我们想要持久化的所有自定义类添加一个基类,以及每次我们pickle/unpickle 时的一个小添加。

模块

import io
import time
import pickle


class Persistent:

    def __init__(self):
        """Both unique and unchanging, even after modifying or pickling/unpickling object
        Potential problem if clocks are not in sync"""
        self.persistent_id = str(id(self)) + str(time.time())


def make_persistent_memo(obj):
    """Makes two dictionaries (one reverse of other) linking every instance of Persistent found
    in the attributes and collections of obj recursively, with the persistent id of that instant.
    Can cope with circular references and recursively nested objects"""

    def add_to_memo(item, id_to_obj, obj_to_id, checked):

        # Prevents checking the same object multiple times
        if id(item) in checked:
            return id_to_obj, obj_to_id, checked
        else:
            checked.add(id(item))

            if isinstance(item, Persistent):
                id_to_obj[item.persistent_id] = item
                obj_to_id[item] = item.persistent_id

        try:  # Try to add attributes of item to memo, recursively
            for sub_item in vars(item).values():
                add_to_memo(sub_item, id_to_obj, obj_to_id, checked)
        except TypeError:
            pass

        try:  # Try to add iterable elements of item to memo, recursively
            for sub_item in item:
                add_to_memo(sub_item, id_to_obj, obj_to_id, checked)
        except TypeError:
            pass

        return id_to_obj, obj_to_id, checked

    return add_to_memo(obj, {}, {}, set())[:2]


class PersistentPickler(pickle.Pickler):
    """ Normal pickler, but it takes a memo of the form {obj: persistent id}
    any object in that memo is pickled as its persistent id instead"""

    @staticmethod  # Because dumps is not defined for custom Picklers
    def dumps(obj_to_id_memo, obj):
        with io.BytesIO() as file:
            PersistentPickler(file, obj_to_id_memo).dump(obj)
            file.seek(0)
            return file.read()

    def __init__(self, file, obj_to_id_memo):
        super().__init__(file)
        self.obj_to_id_memo = obj_to_id_memo

    def persistent_id(self, obj):
        try:
            if obj in self.obj_to_id_memo and obj:
                return self.obj_to_id_memo[obj]
        except TypeError:  # If obj is unhashable
            pass
        return None


class PersistentUnPickler(pickle.Unpickler):
    """ Normal pickler, but it takes a memo of the form {persistent id: obj}
    used to undo the effects of PersistentPickler"""

    @staticmethod  # Because loads is not defined for custom Unpicklers
    def loads(id_to_obj_memo, pickled_data):
        with io.BytesIO(pickled_data) as file:
            obj = PersistentUnPickler(file, id_to_obj_memo).load()
        return obj

    def __init__(self, file, id_to_obj_memo):
        super().__init__(file)
        self.id_to_obj_memo = id_to_obj_memo

    def persistent_load(self, pid):
        if pid in self.id_to_obj_memo:
            return self.id_to_obj_memo[pid]
        else:
            super().persistent_load(pid)

使用示例

class Alice(Persistent):
    """ Must have a single attribute saved as bob or claire """

    def __init__(self):
        super().__init__()
        self.shared = Shared()
        self.bob = Bob()
        self.claire = Claire()

    def add_some_data(self, x, y):
        self.nested = [self]
        self.nested.append(self.nested)
        self.shared.x = x
        self.shared.y = y


class Bob(Persistent):
    """ Can have arbitrary reference to itself and to Alice but must not touch Claire """

    def make_changes(self, alice, extra):
        self.value = alice.shared.x + alice.shared.y + extra
        self.attribute = alice.shared
        self.collection = [alice.bob, alice.shared]
        self.collection.append(self.collection)
        self.new = Shared()


class Claire(Persistent):
    """ Can have arbitrary reference to itself and to Alice but must not touch Bob """

    def make_changes(self, alice, extra):
        self.value = alice.shared.x * alice.shared.y * extra
        self.attribute = alice
        self.collection = {"claire": alice.claire, "shared": alice.shared}
        self.collection["collection"] = self.collection


class Shared(Persistent):
    pass


# Done on Alice's side
alice = Alice()
alice.add_some_data(2, 3)
outgoing = pickle.dumps(alice)

# Done on Bob's side
bobs_copy = pickle.loads(outgoing)
# Create a memo of the persistent_id of the received objects that are *not* being modified
_, bob_pickling_memo = make_persistent_memo(bobs_copy)
bob_pickling_memo.pop(bobs_copy.bob)
# Make changes and send everything back to Alice
bobs_copy.bob.make_changes(bobs_copy, 4)
bobs_reply = PersistentPickler.dumps(bob_pickling_memo, bobs_copy.bob)


# Same on Claires's side
claires_copy = pickle.loads(outgoing)

_, claire_pickling_memo = make_persistent_memo(claires_copy)
claire_pickling_memo.pop(claires_copy.claire)

claires_copy.claire.make_changes(claires_copy, 5)
claires_reply = PersistentPickler.dumps(claire_pickling_memo, claires_copy.claire)


# Done on Alice's side
alice_unpickling_memo, _ = make_persistent_memo(alice)
alice.bob = PersistentUnPickler.loads(alice_unpickling_memo, bobs_reply)
alice.claire = PersistentUnPickler.loads(alice_unpickling_memo, claires_reply)

# Check that Alice has received changes from Bob and Claire
print(alice.bob.value == bobs_copy.bob.value == 9,
      alice.claire.value == claires_copy.claire.value == 30)
# Check that all references match up as expected
print("Alice:", alice is alice.nested[0] is alice.nested[1][0] is alice.claire.attribute)

print("Bob:", (alice.bob is alice.nested[0].bob is alice.bob.collection[0] is
               alice.bob.collection[2][0]))

print("Claire:", (alice.claire is alice.nested[0].claire is alice.claire.collection["claire"] is
                  alice.claire.collection["collection"]["claire"]))

print("Shared:", (alice.shared is alice.bob.attribute is alice.bob.collection[1] is
                  alice.bob.collection[2][1] is alice.claire.collection["shared"] is
                  alice.claire.collection["collection"]["shared"] is not alice.bob.new))

输出

C>python test.py
True True
Alice: True
Bob: True
Claire: True
Shared: True

完全按照要求

跟进

感觉就像我在这里通过自己的嵌套自省重新发明轮子,使用现有工具可以做得更好吗?

我的代码感觉效率很低,自省了很多,能不能改进一下?

我能确定 add_to_memo() 没有遗漏一些引用吗?

使用 time.time() 创建持久化 id 感觉比较笨重,有没有更好的选择?

相关问题