作业lib.Parallel中的“ batch_size”与multiprocessing.Pool.map中的“ chunk_size”有何不同?

时间:2018-12-17 19:19:39

标签: python python-multiprocessing joblib

关于joblib中的“ batch_size”选项的工作方式,有两点我不清楚。以下代码可以捕获我的关注。

from multiprocessing import Pool
from joblib import Parallel, delayed
import time
import numpy as np
import sys

def doubler(number):
    time.sleep(0.01)
    return number * 2

def main(idx,num_proc):
    num_elements = 1000
    num_iter     = 2

    # multiprocessing    
    if idx == 0:
        x = np.arange(1,num_elements,1,int)
        print("Module: multiprocessing.Pool")
        chunk = int(num_elements/(4*num_proc))
        pool = Pool(processes=num_proc)
        for iter in range(num_iter):
            t1 = time.time()
            result = pool.map(doubler, x, chunk)
            print("num_proc =", num_proc, ", runtime = ", time.time() - t1)
        pool.close()    

    # joblib
    elif idx == 1:
        print("Module: joblib.Parallel")
#        chunk = int(num_elements/(4*num_proc))
        with Parallel(n_jobs=num_proc,backend='multiprocessing',batch_size='auto',prefer='processes',max_nbytes=None,verbose=10) as parallel:
#        with Parallel(n_jobs=num_proc,backend='multiprocessing',batch_size=chunk,pre_dispatch=chunk,prefer='processes',max_nbytes=None,verbose=20) as parallel:
            for iter in range(num_iter):    
                t1 = time.time()
                result = parallel(delayed(doubler)(x) for x in np.arange(1,num_elements,1,int))        
                print("num_proc =", num_proc, ", runtime = ", time.time() - t1)

if __name__ == '__main__':
    idx = 1
    num_proc = 2
    if len(sys.argv) > 1:
        idx = int(sys.argv[1])
        num_proc = int(sys.argv[2])
    main(idx,num_proc)

首先,idx = 1的代码显示joblib中batch_size的角色。当我运行程序时,默认设置必须为pre_dispath = 4且batch_size ='auto'且初始大小为1。随着详细消息的打印输出,它以batch_size = 1开头,然后在12之后切换为batch_size = 2任务,然后在执行20个任务后达到batch_size = 26,总体而言,第一次运行耗时5.7秒。

Module: joblib.Parallel
[Parallel(n_jobs=2)]: Using backend MultiprocessingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    0.4s
[Parallel(n_jobs=2)]: Done   2 tasks      | elapsed:    0.4s
[Parallel(n_jobs=2)]: Done   3 tasks      | elapsed:    0.4s
[Parallel(n_jobs=2)]: Done   4 tasks      | elapsed:    0.4s
[Parallel(n_jobs=2)]: Done   5 tasks      | elapsed:    0.4s
[Parallel(n_jobs=2)]: Done   6 tasks      | elapsed:    0.4s
[Parallel(n_jobs=2)]: Done   7 tasks      | elapsed:    0.4s
[Parallel(n_jobs=2)]: Done   8 tasks      | elapsed:    0.4s
[Parallel(n_jobs=2)]: Done   9 tasks      | elapsed:    0.4s
[Parallel(n_jobs=2)]: Batch computation too fast (0.1690s.) Setting batch_size=2.
[Parallel(n_jobs=2)]: Done  10 tasks      | elapsed:    0.4s
[Parallel(n_jobs=2)]: Done  11 tasks      | elapsed:    0.4s
[Parallel(n_jobs=2)]: Done  12 tasks      | elapsed:    0.4s
[Parallel(n_jobs=2)]: Done  14 tasks      | elapsed:    0.4s
[Parallel(n_jobs=2)]: Batch computation too fast (0.0301s.) Setting batch_size=26.
[Parallel(n_jobs=2)]: Done  16 tasks      | elapsed:    0.4s
[Parallel(n_jobs=2)]: Done  18 tasks      | elapsed:    0.5s
[Parallel(n_jobs=2)]: Done  20 tasks      | elapsed:    0.5s
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.7s
...
[Parallel(n_jobs=2)]: Done 999 out of 999 | elapsed:    5.6s finished
num_proc = 2 , runtime =  5.668241024017334

现在,重用相同的池,我再次运行相同的功能。

[Parallel(n_jobs=2)]: Done   4 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done  30 tasks      | elapsed:    0.2s
[Parallel(n_jobs=2)]: Done  56 tasks      | elapsed:    0.5s
[Parallel(n_jobs=2)]: Done  82 tasks      | elapsed:    0.8s
...
[Parallel(n_jobs=2)]: Done 999 tasks      | elapsed:   10.3s
[Parallel(n_jobs=2)]: Done 999 out of 999 | elapsed:   10.3s finished
num_proc = 2 , runtime =  10.353703498840332

从该消息中可以清楚地看到pre_dispatch = 4和batch_size =26。这26个batch_size来自上一次程序运行。但是,第二轮运行花了10.4秒,比第一轮慢了两倍。

我的第一个问题是: 1。除了“不”重用池之外,是否有办法解决“ batch_size”覆盖问题?


其次,我试图了解与multiprocessing.Pool相比的joblib.Parallel(如果有)中的任何开销。为此,我通过添加

更改了代码
chunk = int(num_elements/(4*num_proc))
with Parallel(n_jobs=num_proc,backend='multiprocessing',batch_size=chunk,pre_dispatch=chunk,prefer='processes',max_nbytes=None,verbose=20) as parallel:

据我了解(如果我错了,请纠正我),multirprocessing.Pool.map中的默认chunk_size为num_elements /(4 * num_proc)。但是,结果显示joblib.Parallel花了10.6秒,而multiprocessing.Pool花了5.4秒(您可以通过设置idx = 0来检查)。

然后,我的第二个问题是: 2。作业库中的“ batch_size”与多处理中的“ chunk_size”有何不同?

(请注意,我正在使用python 3.6.5和joblib 0.12.5)

0 个答案:

没有答案