如何在Python多处理池中运行清理代码?

时间:2011-05-20 17:17:47

标签: python multiprocessing

我有一些Python代码(在Windows上),它使用多处理模块来运行一个工作进程池。每个工作进程都需要在map_async方法结束时进行一些清理。

有谁知道怎么做?

2 个答案:

答案 0 :(得分:4)

您是否真的想为每个工作进程运行一次清理功能,而不是为map_async调用创建的每个任务运行一次?

multiprocess.pool.Pool创建一个比如8个工作进程的池。 map_async可能会提交40项任务,分配给8名工作人员。 我可以想象为什么你可能想在每个任务结束时运行清理代码,但是我很难想象你为什么要在每个8个工作进程完成之前运行清理代码。

尽管如此,如果这是你想要做的,你可以通过monkeypatching multiprocessing.pool.worker来做到这一点:

import multiprocessing as mp
import multiprocessing.pool as mpool
from multiprocessing.util import debug

def cleanup():
    print('{n} CLEANUP'.format(n=mp.current_process().name))

# This code comes from /usr/lib/python2.6/multiprocessing/pool.py,
# except for the single line at the end which calls cleanup().
def myworker(inqueue, outqueue, initializer=None, initargs=()):
    put = outqueue.put
    get = inqueue.get
    if hasattr(inqueue, '_writer'):
        inqueue._writer.close()
        outqueue._reader.close()

    if initializer is not None:
        initializer(*initargs)

    while 1:
        try:
            task = get()
        except (EOFError, IOError):
            debug('worker got EOFError or IOError -- exiting')
            break

        if task is None:
            debug('worker got sentinel -- exiting')
            break

        job, i, func, args, kwds = task
        try:
            result = (True, func(*args, **kwds))
        except Exception, e:
            result = (False, e)
        put((job, i, result))
    cleanup()

# Here we monkeypatch mpool.worker
mpool.worker=myworker

def foo(i):
    return i*i

def main():
    pool = mp.Pool(8)
    results = pool.map_async(foo, range(40)).get()
    print(results)

if __name__=='__main__':
    main()

的产率:

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 900, 961, 1024, 1089, 1156, 1225, 1296, 1369, 1444, 1521]
PoolWorker-8 CLEANUP
PoolWorker-3 CLEANUP
PoolWorker-7 CLEANUP
PoolWorker-1 CLEANUP
PoolWorker-6 CLEANUP
PoolWorker-2 CLEANUP
PoolWorker-4 CLEANUP
PoolWorker-5 CLEANUP

答案 1 :(得分:3)

这里唯一真正的选择是在你map_async的函数末尾运行清理。

如果这次清理真的是针对进程死亡,那么就不能使用池的概念。它们是正交的。除非使用{2.7}中的新增功能maxtasksperchild,否则池不会指示进程生存期。即使这样,你也无法在进程死亡时运行代码。但是,maxtasksperchild可能适合您,因为当流程终止时,流程打开的任何资源肯定都会消失。

话虽这么说,如果你有一堆你需要运行清理的功能,你可以通过设计一个装饰器来节省重复工作。这是我的意思的一个例子:

import functools
import multiprocessing

def cleanup(f):
    """Decorator for shared cleanup mechanism"""
    @functools.wraps(f)
    def wrapped(arg):
        result = f(arg)
        print("Cleaning up after f({0})".format(arg))
        return result
    return wrapped

@cleanup
def task1(arg):
    print("Hello from task1({0})".format(arg))
    return arg * 2

@cleanup
def task2(arg):
    print("Bonjour from task2({0})".format(arg))
    return arg ** 2

def main():
    p = multiprocessing.Pool(processes=3)
    print(p.map(task1, [1, 2, 3]))
    print(p.map(task2, [1, 2, 3]))

if __name__ == "__main__":
    main()

执行此操作时(禁止stdout混乱,因为我没有将其锁定在这里以简洁起见),您解决问题的顺序应该表明您的清理任务在每项任务结束时都在运行:< / p>

Hello from task1(1)
Cleaning up after f(1)
Hello from task1(2)
Cleaning up after f(2)
Hello from task1(3)
Cleaning up after f(3)
[2, 4, 6]

Bonjour from task2(1)
Cleaning up after f(1)
Bonjour from task2(2)
Cleaning up after f(2)
Bonjour from task2(3)
Cleaning up after f(3)
[1, 4, 9]