鉴于广泛的搜索,我仍然很难让一个特定的函数使用多个进程运行。要求是:
最新的尝试运行,但是time.sleep似乎正在影响所有进程 - 执行时间相同 - 20秒,无论池是否用于多进程foo
或foo
直接调用(它应该是分别为4/20秒)。我错过了什么?
from multiprocessing import Pool, Process, Lock
import os
import time
def foo(arg):
print '{} - {}'.format(arg[0], os.getpid())
time.sleep(1)
if __name__ == '__main__':
script_start_time = time.time()
pool = Pool(processes=5)
for i in range(20):
arg = [i, i]
pool.map(foo, [arg])
pool.close() #necessary to prevent zombies
pool.join() #wait for all processes to finish
print 'Execution time {}s '.format(time.time() - script_start_time)
结果:
0 - 5660
1 - 5672
2 - 5684
3 - 5704
4 - 5716
5 - 5660
6 - 5672
7 - 5684
8 - 5704
9 - 5716
10 - 5660
11 - 5672
12 - 5684
13 - 5704
14 - 5716
15 - 5660
16 - 5672
17 - 5684
18 - 5704
19 - 5716
Execution time 20.4240000248s
答案 0 :(得分:0)
正如评论中所提到的,pool.map
将在执行完成之前阻止,因此您必须提交apply_async
或map_async
的作业并使用回调处理函数返回数据。或者,您可以提前构建所有输入,并立即在所有输入上调用map
。
在此示例中,apply_async和map_async非常相似,区别在于apply_async一次只能提交一个作业,并支持传递多个args和kwargs。例如:
from multiprocessing import Pool
import os
import time
def add(a, b):
c = a+b
print(f'{a}+{b} = {c} from process: {os.getpid()}') #python 3 f-strings are nifty :)
time.sleep(1)
return c
if __name__ == '__main__':
script_start_time = time.time()
pool = Pool(processes=5)
results = []
for a in range(5):
for b in range(5,10):
pool.apply_async(add, (a,b), callback=lambda c: results.append(c))
pool.close() #necessary to prevent zombies
pool.join() #wait for all processes to finish
print('results', results)
print('Execution time {}s '.format(time.time() - script_start_time))
记下调用apply_async
时参数的传递方式。
或者,您可以使用法线贴图一次性传递参数,但这要求您的函数只接受一个参数。这是starmap
方法有用的地方。它需要一个可迭代的元组,并将元组解包到函数参数中,因此pool.starmap(foo, [(a,b),(c,d),(e,f)]
)的输入会将每个对解包为foo,这需要两个参数:
if __name__ == '__main__':
script_start_time = time.time()
pool = Pool(processes=5)
args = [(a,b) for a in "abc" for b in "ABC"]
print(pool.starmap(add, args)) #same add function from before (works with strings too)
pool.close() #necessary to prevent zombies
pool.join() #wait for all processes to finish
print('Execution time {}s '.format(time.time() - script_start_time))