多处理池map_async的意外行为

时间:2018-02-19 16:47:39

标签: python python-3.x multiprocessing python-multiprocessing

我有一些代码对python 3应用程序中的几个文件执行相同的操作,因此它似乎是multiprocessing的一个很好的候选者。我正在尝试使用Pool将工作分配给一些进程。我希望代码在这些计算进行时继续做其他事情(主要是为用户显示内容),所以我想使用map_async类的multiprocessing.Pool函数来实现这一点。我希望在调用它之后,代码将继续,结果将由我指定的回调处理,但这似乎没有发生。以下代码显示了我尝试调用map_async的三种方式以及我看到的结果:

import multiprocessing
NUM_PROCS = 4
def func(arg_list):
    arg1 = arg_list[0]
    arg2 = arg_list[1]
    print('start func')
    print ('arg1 = {0}'.format(arg1))
    print ('arg2 = {0}'.format(arg2))
    time.sleep(1)
    result1 = arg1 * arg2
    print('end func')
    return result1

def callback(result):
    print('result is {0}'.format(result))


def error_handler(error1):
    print('error in call\n {0}'.format(error1))


def async1(arg_list1):
    # This is how my understanding of map_async suggests i should
    # call it. When I execute this, the target function func() is not called
    with multiprocessing.Pool(NUM_PROCS) as p1:
        r1 = p1.map_async(func,
                          arg_list1,
                          callback=callback,
                          error_callback=error_handler)


def async2(arg_list1):
    with multiprocessing.Pool(NUM_PROCS) as p1:
        # If I call the wait function on the result for a small
        # amount of time, then the target function func() is called
        # and executes sucessfully in 2 processes, but the callback
        # function is never called so the results are not processed
        r1 = p1.map_async(func,
                          arg_list1,
                          callback=callback,
                          error_callback=error_handler)
        r1.wait(0.1)


def async3(arg_list1):
    # if I explicitly call join on the pool, then the target function func()
    # successfully executes in 2 processes and the callback function is also
    # called, but by calling join the processing is not asynchronous any more
    # as join blocks the main process until the other processes are finished.
    with multiprocessing.Pool(NUM_PROCS) as p1:
        r1 = p1.map_async(func,
                          arg_list1,
                          callback=callback,
                          error_callback=error_handler)
        p1.close()
        p1.join()


def main():
    arg_list1 = [(5, 3), (7, 4), (-8, 10), (4, 12)]
    async3(arg_list1)

    print('pool executed successfully')


if __name__ == '__main__':
    main()

在main中调用async1async2async3时,结果将在每个函数的注释中描述。任何人都可以解释为什么不同的呼叫表现得像他们一样吗?最后我想在map_async中调用async1,所以我可以在工作进程繁忙时在主进程中执行其他操作。我已经使用python 2.7和3.6测试了这个代码,在较旧的RH6 linux盒子和更新的ubuntu VM上,结果相同。

1 个答案:

答案 0 :(得分:3)

这种情况正在发生,因为当您使用multiprocessing.Pool作为上下文管理器pool.terminate() is called when you leave the with block时,会立即退出所有工作人员,而无需等待正在进行的任务完成。

  

3.3版中的新功能:Pool对象现在支持上下文管理协议 - 请参阅上下文管理器Types. __enter__()返回池对象,__exit__()调用terminate()

IMO使用terminate()作为上下文管理器的__exit__方法并不是一个很好的设计选择,因为似乎大多数人都会直观地期望close()将被调用,这将等待在退出之前完成正在进行的任务。不幸的是,您只能使用上下文管理器重构您的代码,或重构您的代码,以确保您不会离开with阻止,直到Pool完成其工作