Question

假设您正在使用multiprocessing.Pool对象，并且您正在使用构造函数的initializer设置来传递初始化函数，然后在全局命名空间中创建资源。假设资源有一个上下文管理器。如果它必须贯穿整个过程的生命周期，你将如何处理上下文管理资源的生命周期，但最后是否应该进行适当的清理？

到目前为止，我有点像这样：

resource_cm = None
resource = None


def _worker_init(args):
    global resource
    resource_cm = open_resource(args)
    resource = resource_cm.__enter__()

从此开始，池进程可以使用该资源。到现在为止还挺好。但是处理清理有点棘手，因为multiprocessing.Pool类没有提供destructor或deinitializer参数。

我的一个想法是使用atexit模块，并在初始化程序中注册清理。像这样：

def _worker_init(args):
    global resource
    resource_cm = open_resource(args)
    resource = resource_cm.__enter__()

    def _clean_up():
        resource_cm.__exit__()

    import atexit
    atexit.register(_clean_up)

这是一个好方法吗？有没有更简单的方法呢？

编辑：atexit似乎不起作用。至少不是我在上面使用它的方式，所以现在我仍然没有解决这个问题的方法。

Answer 1

首先，这是一个非常好的问题！在multiprocessing代码中挖掘了一下后，我想我已经找到了一种方法：

当您启动multiprocessing.Pool时，Pool对象在内部为池的每个成员创建一个multiprocessing.Process对象。当这些子进程启动时，它们会调用_bootstrap函数，如下所示：

def _bootstrap(self):
    from . import util
    global _current_process
    try:
        # ... (stuff we don't care about)
        util._finalizer_registry.clear()
        util._run_after_forkers()
        util.info('child process calling self.run()')
        try:
            self.run()
            exitcode = 0 
        finally:
            util._exit_function()
        # ... (more stuff we don't care about)

run方法实际上是您为target对象提供的Process。对于Pool进程，该进程具有长时间运行的while循环，等待工作项通过内部队列进入。对我们来说真正有趣的是 self.run后发生的事情：util._exit_function()被调用。

事实证明，该功能可以进行一些清理，听起来很像您正在寻找的内容：

def _exit_function(info=info, debug=debug, _run_finalizers=_run_finalizers, active_children=active_children, current_process=current_process): # NB: we hold on to references to functions in the arglist due to the # situation described below, where this function is called after this # module's globals are destroyed. global _exiting info('process shutting down') debug('running all "atexit" finalizers with priority >= 0') # Very interesting! _run_finalizers(0)

这是_run_finalizers的文档字符串：

def _run_finalizers(minpriority=None): ''' Run all finalizers whose exit priority is not None and at least minpriority Finalizers with highest priority are called first; finalizers with the same priority will be called in reverse order of creation. '''

该方法实际上运行了一个终结器回调列表并执行它们：

items = [x for x in _finalizer_registry.items() if f(x)] items.sort(reverse=True) for key, finalizer in items: sub_debug('calling %s', finalizer) try: finalizer() except Exception: import traceback traceback.print_exc()

完美。那么我们如何进入_finalizer_registry？ Finalize中有一个名为multiprocessing.util的未记录对象，负责向注册表添加回调：

class Finalize(object): ''' Class which supports object finalization using weakrefs ''' def __init__(self, obj, callback, args=(), kwargs=None, exitpriority=None): assert exitpriority is None or type(exitpriority) is int if obj is not None: self._weakref = weakref.ref(obj, self) else: assert exitpriority is not None self._callback = callback self._args = args self._kwargs = kwargs or {} self._key = (exitpriority, _finalizer_counter.next()) self._pid = os.getpid() _finalizer_registry[self._key] = self # That's what we're looking for!

好的，所以把它们放在一起作为一个例子：

import multiprocessing from multiprocessing.util import Finalize resource_cm = None resource = None class Resource(object): def __init__(self, args): self.args = args def __enter__(self): print("in __enter__ of %s" % multiprocessing.current_process()) return self def __exit__(self, *args, **kwargs): print("in __exit__ of %s" % multiprocessing.current_process()) def open_resource(args): return Resource(args) def _worker_init(args): global resource print("calling init") resource_cm = open_resource(args) resource = resource_cm.__enter__() # Register a finalizer Finalize(resource, resource.__exit__, exitpriority=16) def hi(*args): print("we're in the worker") if __name__ == "__main__": pool = multiprocessing.Pool(initializer=_worker_init, initargs=("abc",)) pool.map(hi, range(pool._processes)) pool.close() pool.join()

输出：

calling init in __enter__ of <Process(PoolWorker-1, started daemon)> calling init calling init in __enter__ of <Process(PoolWorker-2, started daemon)> in __enter__ of <Process(PoolWorker-3, started daemon)> calling init in __enter__ of <Process(PoolWorker-4, started daemon)> we're in the worker we're in the worker we're in the worker we're in the worker in __exit__ of <Process(PoolWorker-1, started daemon)> in __exit__ of <Process(PoolWorker-2, started daemon)> in __exit__ of <Process(PoolWorker-3, started daemon)> in __exit__ of <Process(PoolWorker-4, started daemon)>

正如您所见__exit__，当我们join()我们的工作人员时，{{1}}会被调用。

Answer 2

您可以继承Process的子类并覆盖其run()方法，以便它在退出之前执行清除操作。然后，您应该将Pool子类化，以便它使用您子类化的过程：

from multiprocessing import Process
from multiprocessing.pool import Pool

class SafeProcess(Process):
    """ Process that will cleanup before exit """
    def run(self, *args, **kw):
        result = super().run(*args, **kw)
        # cleanup however you want here
        return result


class SafePool(Pool):
    Process = SafeProcess


pool = SafePool(4)  # use it as standard Pool

上下文管理器和多处理池

2 个答案: