Question

问题

我正在写一个软件，我想在其中共享某个模块中的对象。该对象应该可以在不同的模块中以及在不同的过程中进行修改。考虑以下（简化）版本的问题：

模块

module_shared.py

# Example class with simplified behaviour
class Shared:

    def __init__(self):
        self.shared = dict()

    def set(self, **kwargs):
        for key, value in kwargs.items():
            self.shared[key] = value

    def get(self, *args):
        return {key: self.shared[key] for key in args} if args else self.shared

# Module-scope instance of the Shared class
shared = Shared()

module_a.py

from multiprocessing import Process
from time import sleep
import module_shared as ms

def run():
    Process(target=run_process).start()

def run_process():
    i = 0
    while True:
        sleep(3)
        ms.shared.set(module_a=i)
        i+=1
        print("Shared from within module_a", ms.shared.get())

module_b.py

from multiprocessing import Process
from time import sleep
import module_shared as ms


def run():
    Process(target=run_process).start()

def run_process():
    i = 0
    while True:
        sleep(2)
        ms.shared.set(module_b=i)
        i-=1
        print("Shared from within module_b", ms.shared.get())

module_main.py

import module_a
import module_b
import module_shared as ms
from time import sleep

if __name__ == '__main__':
    module_a.run()
    module_b.run()
    while True:
        sleep(5)
        print("Shared from within module_main", ms.shared.get())

输出

运行module_main的输出如下：

Shared from within module_b {'module_b': 0}
Shared from within module_a {'module_a': 0}
Shared from within module_b {'module_b': -1}
Shared from within module_main {}
Shared from within module_a {'module_a': 1}
Shared from within module_b {'module_b': -2}
...

预期输出如下：

Shared from within module_b {'module_b': 0}
Shared from within module_a {'module_a': 0, 'module_b': 0}
Shared from within module_b {'module_a': 0, 'module_b': -1}
Shared from within module_main {'module_a': 0, 'module_b': -1}
Shared from within module_a {'module_a': 1, 'module_b': -1}
Shared from within module_b {'module_a': 1, 'module_b': -2}
...

进一步的解释

shared实例不会全局修改，因为每个进程都有自己的内存空间。最初，我尝试使用Manager模块中的multiprocessing修复它，但是由于执行导入语句的时间和方式存在错误，我认为设置失败。这是在Manager()的{{1}}中调用Shared时出现的错误消息：

__init__

目前最好的解决方案是使用线程，但是我更喜欢使用进程。自然，如果存在任何更简单（或更优）的解决方案，我将很乐意考虑使用它们。

编辑：

我已经意识到我在上次尝试使用线程时打错了打字，使用多个线程实际上效果很好。学习两次阅读代码真是棒极了...

Answer 1

一种方法是使用各种缓存模块之一。 diskcache，shelve等都可以持久存储对象。当然是pickle。

例如，使用diskcache库，您可以采用这种方法，将module_shared.py替换为：

### DISKCACHE Example ###
from diskcache import Cache

cache = Cache('test_cache.cache')

# Example class with simplified behaviour
class Shared:

    def __init__(self, cache):
        self.cache = cache
        self.cache.clear()

    def set(self, **kwargs):
        for key, value in kwargs.items():
            cache.set(key, value)

    def get(self, *args):
        return {key: cache.get(key) for key in args} if args else {(key, cache.get(key)) for key in cache.iterkeys()}


# Module-scope instance of the Shared class
shared = Shared(cache)

输出：

Shared from within module_b {('module_b', 0)}
Shared from within module_a {('module_a', 0), ('module_b', 0)}
Shared from within module_b {('module_a', 0), ('module_b', -1)}
Shared from within module_main {('module_a', 0), ('module_b', -1)}
Shared from within module_a {('module_b', -1), ('module_a', 1)}
Shared from within module_b {('module_b', -2), ('module_a', 1)}

在上面的示例中，module_shared.py是唯一更改的文件。

各种持久性库/方法中的每一个都有自己的怪癖和功能。如果您绝对需要将整个类实例对象持久化，那就在那里。 :)性能仅取决于您的工作方式和缓存机制的选择。 diskcache已证明对我很有能力。

我在这里非常简单地实现了diskcache来演示其功能。请务必阅读简洁明了的文档，以便更好地理解。

此外，我的输出显示了一个无序的字典。您可以轻松地产生经过排序的数据，以使其始终与module_a一致地匹配您自己的输出。为了简单起见，我忽略了这一点。

Answer 2

在documentation中查找自定义Manager对象，这是一个想法。

将这些行添加到module_shared.py：

from multiprocessing.managers import BaseManager

class SharedManager(BaseManager):
    pass

SharedManager.register('Shared', Shared)
manager = SharedManager()
manager.start()
shared = manager.Shared()

（摆脱shared的旧定义）

在我生产的计算机上运行它

$ python module_main.py 
Shared from within module_b {'module_b': 0}
Shared from within module_a {'module_b': 0, 'module_a': 0}
Shared from within module_b {'module_b': -1, 'module_a': 0}
Shared from within module_main {'module_b': -1, 'module_a': 0}
Shared from within module_a {'module_b': -1, 'module_a': 1}
Shared from within module_b {'module_b': -2, 'module_a': 1}
Shared from within module_b {'module_b': -3, 'module_a': 1}
Shared from within module_a {'module_b': -3, 'module_a': 2}
Shared from within module_main {'module_b': -3, 'module_a': 2}
Shared from within module_b {'module_b': -4, 'module_a': 2}
...etc

对我来说，它看起来像预期的结果。

module_shared.py开始一个过程（行manager.start()）有点奇怪，因为我们通常不希望模块做任何事情，但是由于问题的限制，我认为这是唯一的做到的方式。如果我是为自己编写的，那么我将以module_main而不是module_shared的身份来创建管理器，就像我们在这里一样（也许使用上面文档链接中描述的上下文管理器，而不是{{ 1}}方法），我会将那个管理器作为函数参数传递给.start和run的{{1}}方法。

您可能还对SyncManager感兴趣，它是a的子类，已经注册了很多基本类型，包括dict，基本上涵盖了此处的功能。

跨多个流程和模块共享的全局可修改对象

问题

模块

输出

进一步的解释

2 个答案: