假设您正在使用multiprocessing.Pool
对象,并且您正在使用构造函数的initializer
设置来传递初始化函数,然后在全局命名空间中创建资源。假设资源有一个上下文管理器。如果它必须贯穿整个过程的生命周期,你将如何处理上下文管理资源的生命周期,但最后是否应该进行适当的清理?
到目前为止,我有点像这样:
resource_cm = None
resource = None
def _worker_init(args):
global resource
resource_cm = open_resource(args)
resource = resource_cm.__enter__()
从此开始,池进程可以使用该资源。到现在为止还挺好。但是处理清理有点棘手,因为multiprocessing.Pool
类没有提供destructor
或deinitializer
参数。
我的一个想法是使用atexit
模块,并在初始化程序中注册清理。像这样:
def _worker_init(args):
global resource
resource_cm = open_resource(args)
resource = resource_cm.__enter__()
def _clean_up():
resource_cm.__exit__()
import atexit
atexit.register(_clean_up)
这是一个好方法吗?有没有更简单的方法呢?
编辑:atexit
似乎不起作用。至少不是我在上面使用它的方式,所以现在我仍然没有解决这个问题的方法。
答案 0 :(得分:28)
首先,这是一个非常好的问题!在multiprocessing
代码中挖掘了一下后,我想我已经找到了一种方法:
当您启动multiprocessing.Pool
时,Pool
对象在内部为池的每个成员创建一个multiprocessing.Process
对象。当这些子进程启动时,它们会调用_bootstrap
函数,如下所示:
def _bootstrap(self):
from . import util
global _current_process
try:
# ... (stuff we don't care about)
util._finalizer_registry.clear()
util._run_after_forkers()
util.info('child process calling self.run()')
try:
self.run()
exitcode = 0
finally:
util._exit_function()
# ... (more stuff we don't care about)
run
方法实际上是您为target
对象提供的Process
。对于Pool
进程,该进程具有长时间运行的while循环,等待工作项通过内部队列进入。对我们来说真正有趣的是 self.run
后发生的事情:util._exit_function()
被调用。
事实证明,该功能可以进行一些清理,听起来很像您正在寻找的内容:
def _exit_function(info=info, debug=debug, _run_finalizers=_run_finalizers,
active_children=active_children,
current_process=current_process):
# NB: we hold on to references to functions in the arglist due to the
# situation described below, where this function is called after this
# module's globals are destroyed.
global _exiting
info('process shutting down')
debug('running all "atexit" finalizers with priority >= 0') # Very interesting!
_run_finalizers(0)
这是_run_finalizers
的文档字符串:
def _run_finalizers(minpriority=None):
'''
Run all finalizers whose exit priority is not None and at least minpriority
Finalizers with highest priority are called first; finalizers with
the same priority will be called in reverse order of creation.
'''
该方法实际上运行了一个终结器回调列表并执行它们:
items = [x for x in _finalizer_registry.items() if f(x)]
items.sort(reverse=True)
for key, finalizer in items:
sub_debug('calling %s', finalizer)
try:
finalizer()
except Exception:
import traceback
traceback.print_exc()
完美。那么我们如何进入_finalizer_registry
? Finalize
中有一个名为multiprocessing.util
的未记录对象,负责向注册表添加回调:
class Finalize(object):
'''
Class which supports object finalization using weakrefs
'''
def __init__(self, obj, callback, args=(), kwargs=None, exitpriority=None):
assert exitpriority is None or type(exitpriority) is int
if obj is not None:
self._weakref = weakref.ref(obj, self)
else:
assert exitpriority is not None
self._callback = callback
self._args = args
self._kwargs = kwargs or {}
self._key = (exitpriority, _finalizer_counter.next())
self._pid = os.getpid()
_finalizer_registry[self._key] = self # That's what we're looking for!
好的,所以把它们放在一起作为一个例子:
import multiprocessing
from multiprocessing.util import Finalize
resource_cm = None
resource = None
class Resource(object):
def __init__(self, args):
self.args = args
def __enter__(self):
print("in __enter__ of %s" % multiprocessing.current_process())
return self
def __exit__(self, *args, **kwargs):
print("in __exit__ of %s" % multiprocessing.current_process())
def open_resource(args):
return Resource(args)
def _worker_init(args):
global resource
print("calling init")
resource_cm = open_resource(args)
resource = resource_cm.__enter__()
# Register a finalizer
Finalize(resource, resource.__exit__, exitpriority=16)
def hi(*args):
print("we're in the worker")
if __name__ == "__main__":
pool = multiprocessing.Pool(initializer=_worker_init, initargs=("abc",))
pool.map(hi, range(pool._processes))
pool.close()
pool.join()
输出:
calling init
in __enter__ of <Process(PoolWorker-1, started daemon)>
calling init
calling init
in __enter__ of <Process(PoolWorker-2, started daemon)>
in __enter__ of <Process(PoolWorker-3, started daemon)>
calling init
in __enter__ of <Process(PoolWorker-4, started daemon)>
we're in the worker
we're in the worker
we're in the worker
we're in the worker
in __exit__ of <Process(PoolWorker-1, started daemon)>
in __exit__ of <Process(PoolWorker-2, started daemon)>
in __exit__ of <Process(PoolWorker-3, started daemon)>
in __exit__ of <Process(PoolWorker-4, started daemon)>
正如您所见__exit__
,当我们join()
我们的工作人员时,{{1}}会被调用。
答案 1 :(得分:4)
您可以继承Process
的子类并覆盖其run()
方法,以便它在退出之前执行清除操作。然后,您应该将Pool
子类化,以便它使用您子类化的过程:
from multiprocessing import Process
from multiprocessing.pool import Pool
class SafeProcess(Process):
""" Process that will cleanup before exit """
def run(self, *args, **kw):
result = super().run(*args, **kw)
# cleanup however you want here
return result
class SafePool(Pool):
Process = SafeProcess
pool = SafePool(4) # use it as standard Pool