为什么多处理池映射与串行映射相比没有加速?

时间:2018-01-28 18:14:39

标签: python python-3.x numpy multiprocessing pool

我有这个非常简单的python代码,我希望通过并行化来加速它。然而,无论我做什么,multiprocessing.Pool.map都没有在标准地图上获得任何东西。

我已经读过其他线程,其中人们使用非常小的函数并不能很好地并行化并导致过多的开销,但我认为这不应该是这种情况。

我做错了吗?

这是示例

#!/usr/bin/python

import numpy, time

def AddNoise(sample):
    #time.sleep(0.001)
    return sample + numpy.random.randint(0,9,sample.shape)
    #return sample + numpy.ones(sample.shape)

n=100
m=10000
start = time.time()
A = list([ numpy.random.randint(0,9,(n,n)) for i in range(m) ])
print("creating %d numpy arrays of %d x %d took %.2f seconds"%(m,n,n,time.time()-start))

for i in range(3):
    start = time.time()
    A = list(map(AddNoise, A))
    print("adding numpy arrays took %.2f seconds"%(time.time()-start))

for i in range(3):
    import multiprocessing
    start = time.time()
    with multiprocessing.Pool(processes=2) as pool:
        A = list(pool.map(AddNoise, A, chunksize=100))
    print("adding numpy arrays with multiprocessing Pool took %.2f seconds"%(time.time()-start))

for i in range(3):
    import concurrent.futures
    start = time.time()
    with concurrent.futures.ProcessPoolExecutor(max_workers=2) as executor:
        A = list(executor.map(AddNoise, A))
    print("adding numpy arrays with concurrent.futures.ProcessPoolExecutor took %.2f seconds"%(time.time()-start))

这导致我的4核/ 8线程笔记本电脑上出现以下输出,否则为空闲

$ python test-pool.py 
creating 10000 numpy arrays of 100 x 100 took 1.54 seconds
adding numpy arrays took 1.65 seconds
adding numpy arrays took 1.51 seconds
adding numpy arrays took 1.51 seconds
adding numpy arrays with multiprocessing Pool took 1.99 seconds
adding numpy arrays with multiprocessing Pool took 1.98 seconds
adding numpy arrays with multiprocessing Pool took 1.94 seconds
adding numpy arrays with concurrent.futures.ProcessPoolExecutor took 3.32 seconds
adding numpy arrays with concurrent.futures.ProcessPoolExecutor took 3.17 seconds
adding numpy arrays with concurrent.futures.ProcessPoolExecutor took 3.25 seconds

1 个答案:

答案 0 :(得分:5)

问题在于结果转移。考虑到使用多处理,您在子进程内创建的数组需要转移回主进程..这是一个开销。

我检查了这个以这种方式修改AddNoise函数,它保留了计算时间,但丢弃了传输时间:

def AddNoise(sample):
   sample + numpy.random.randint(0,9,sample.shape)
   return None