关于joblib中的“ batch_size”选项的工作方式,有两点我不清楚。以下代码可以捕获我的关注。
from multiprocessing import Pool
from joblib import Parallel, delayed
import time
import numpy as np
import sys
def doubler(number):
time.sleep(0.01)
return number * 2
def main(idx,num_proc):
num_elements = 1000
num_iter = 2
# multiprocessing
if idx == 0:
x = np.arange(1,num_elements,1,int)
print("Module: multiprocessing.Pool")
chunk = int(num_elements/(4*num_proc))
pool = Pool(processes=num_proc)
for iter in range(num_iter):
t1 = time.time()
result = pool.map(doubler, x, chunk)
print("num_proc =", num_proc, ", runtime = ", time.time() - t1)
pool.close()
# joblib
elif idx == 1:
print("Module: joblib.Parallel")
# chunk = int(num_elements/(4*num_proc))
with Parallel(n_jobs=num_proc,backend='multiprocessing',batch_size='auto',prefer='processes',max_nbytes=None,verbose=10) as parallel:
# with Parallel(n_jobs=num_proc,backend='multiprocessing',batch_size=chunk,pre_dispatch=chunk,prefer='processes',max_nbytes=None,verbose=20) as parallel:
for iter in range(num_iter):
t1 = time.time()
result = parallel(delayed(doubler)(x) for x in np.arange(1,num_elements,1,int))
print("num_proc =", num_proc, ", runtime = ", time.time() - t1)
if __name__ == '__main__':
idx = 1
num_proc = 2
if len(sys.argv) > 1:
idx = int(sys.argv[1])
num_proc = int(sys.argv[2])
main(idx,num_proc)
首先,idx = 1的代码显示joblib中batch_size的角色。当我运行程序时,默认设置必须为pre_dispath = 4且batch_size ='auto'且初始大小为1。随着详细消息的打印输出,它以batch_size = 1开头,然后在12之后切换为batch_size = 2任务,然后在执行20个任务后达到batch_size = 26,总体而言,第一次运行耗时5.7秒。
Module: joblib.Parallel
[Parallel(n_jobs=2)]: Using backend MultiprocessingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 0.4s
[Parallel(n_jobs=2)]: Done 2 tasks | elapsed: 0.4s
[Parallel(n_jobs=2)]: Done 3 tasks | elapsed: 0.4s
[Parallel(n_jobs=2)]: Done 4 tasks | elapsed: 0.4s
[Parallel(n_jobs=2)]: Done 5 tasks | elapsed: 0.4s
[Parallel(n_jobs=2)]: Done 6 tasks | elapsed: 0.4s
[Parallel(n_jobs=2)]: Done 7 tasks | elapsed: 0.4s
[Parallel(n_jobs=2)]: Done 8 tasks | elapsed: 0.4s
[Parallel(n_jobs=2)]: Done 9 tasks | elapsed: 0.4s
[Parallel(n_jobs=2)]: Batch computation too fast (0.1690s.) Setting batch_size=2.
[Parallel(n_jobs=2)]: Done 10 tasks | elapsed: 0.4s
[Parallel(n_jobs=2)]: Done 11 tasks | elapsed: 0.4s
[Parallel(n_jobs=2)]: Done 12 tasks | elapsed: 0.4s
[Parallel(n_jobs=2)]: Done 14 tasks | elapsed: 0.4s
[Parallel(n_jobs=2)]: Batch computation too fast (0.0301s.) Setting batch_size=26.
[Parallel(n_jobs=2)]: Done 16 tasks | elapsed: 0.4s
[Parallel(n_jobs=2)]: Done 18 tasks | elapsed: 0.5s
[Parallel(n_jobs=2)]: Done 20 tasks | elapsed: 0.5s
[Parallel(n_jobs=2)]: Done 46 tasks | elapsed: 0.7s
...
[Parallel(n_jobs=2)]: Done 999 out of 999 | elapsed: 5.6s finished
num_proc = 2 , runtime = 5.668241024017334
现在,重用相同的池,我再次运行相同的功能。
[Parallel(n_jobs=2)]: Done 4 tasks | elapsed: 0.0s
[Parallel(n_jobs=2)]: Done 30 tasks | elapsed: 0.2s
[Parallel(n_jobs=2)]: Done 56 tasks | elapsed: 0.5s
[Parallel(n_jobs=2)]: Done 82 tasks | elapsed: 0.8s
...
[Parallel(n_jobs=2)]: Done 999 tasks | elapsed: 10.3s
[Parallel(n_jobs=2)]: Done 999 out of 999 | elapsed: 10.3s finished
num_proc = 2 , runtime = 10.353703498840332
从该消息中可以清楚地看到pre_dispatch = 4和batch_size =26。这26个batch_size来自上一次程序运行。但是,第二轮运行花了10.4秒,比第一轮慢了两倍。
我的第一个问题是: 1。除了“不”重用池之外,是否有办法解决“ batch_size”覆盖问题?
其次,我试图了解与multiprocessing.Pool相比的joblib.Parallel(如果有)中的任何开销。为此,我通过添加
更改了代码chunk = int(num_elements/(4*num_proc))
with Parallel(n_jobs=num_proc,backend='multiprocessing',batch_size=chunk,pre_dispatch=chunk,prefer='processes',max_nbytes=None,verbose=20) as parallel:
据我了解(如果我错了,请纠正我),multirprocessing.Pool.map中的默认chunk_size为num_elements /(4 * num_proc)。但是,结果显示joblib.Parallel花了10.6秒,而multiprocessing.Pool花了5.4秒(您可以通过设置idx = 0来检查)。
然后,我的第二个问题是: 2。作业库中的“ batch_size”与多处理中的“ chunk_size”有何不同?
(请注意,我正在使用python 3.6.5和joblib 0.12.5)