多处理混淆 - 基础知识

时间:2015-10-17 15:00:15

标签: python

鉴于广泛的搜索,我仍然很难让一个特定的函数使用多个进程运行。要求是:

  • 限制程序数
  • 将多个参数传递给map

最新的尝试运行,但是time.sleep似乎正在影响所有进程 - 执行时间相同 - 20秒,无论池是否用于多进程foofoo直接调用(它应该是分别为4/20秒)。我错过了什么?

from multiprocessing import Pool, Process, Lock
import os
import time

def foo(arg):
    print '{} - {}'.format(arg[0], os.getpid())
    time.sleep(1)

if __name__ == '__main__':
    script_start_time = time.time()

    pool = Pool(processes=5)
    for i in range(20):
        arg = [i, i]
        pool.map(foo, [arg])

    pool.close() #necessary to prevent zombies
    pool.join() #wait for all processes to finish

    print 'Execution time {}s '.format(time.time() - script_start_time)

结果:

0 - 5660
1 - 5672
2 - 5684
3 - 5704
4 - 5716
5 - 5660
6 - 5672
7 - 5684
8 - 5704
9 - 5716
10 - 5660
11 - 5672
12 - 5684
13 - 5704
14 - 5716
15 - 5660
16 - 5672
17 - 5684
18 - 5704
19 - 5716
Execution time 20.4240000248s

1 个答案:

答案 0 :(得分:0)

正如评论中所提到的,pool.map将在执行完成之前阻止,因此您必须提交apply_asyncmap_async的作业并使用回调处理函数返回数据。或者,您可以提前构建所有输入,并立即在所有输入上调用map

在此示例中,apply_async和map_async非常相似,区别在于apply_async一次只能提交一个作业,并支持传递多个args和kwargs。例如:

from multiprocessing import Pool
import os
import time

def add(a, b):
    c = a+b
    print(f'{a}+{b} = {c} from process: {os.getpid()}') #python 3 f-strings are nifty :)
    time.sleep(1)
    return c

if __name__ == '__main__':
    script_start_time = time.time()
    pool = Pool(processes=5)
    results = []
    for a in range(5):
        for b in range(5,10):
            pool.apply_async(add, (a,b), callback=lambda c: results.append(c))
    pool.close() #necessary to prevent zombies
    pool.join() #wait for all processes to finish
    print('results', results)
    print('Execution time {}s '.format(time.time() - script_start_time))

记下调用apply_async时参数的传递方式。

或者,您可以使用法线贴图一次性传递参数,但这要求您的函数只接受一个参数。这是starmap方法有用的地方。它需要一个可迭代的元组,并将元组解包到函数参数中,因此pool.starmap(foo, [(a,b),(c,d),(e,f)])的输入会将每个对解包为foo,这需要两个参数:

if __name__ == '__main__':
    script_start_time = time.time()
    pool = Pool(processes=5)
    args = [(a,b) for a in "abc" for b in "ABC"]
    print(pool.starmap(add, args)) #same add function from before (works with strings too)
    pool.close() #necessary to prevent zombies
    pool.join() #wait for all processes to finish
    print('Execution time {}s '.format(time.time() - script_start_time))