默认的multiprocessing.Pool
代码包含一个计数器,用于跟踪工作人员完成的任务数量:
completed += 1
logging.debug('worker exiting after %d tasks' % completed)
但是从range(12)
上升到range(20)
a pool.map
会导致计数器出错(这似乎与工作人员创建无关)。我不清楚导致这种情况的原因。
例如:
import multiprocessing as mp
def ret_x(x):
return x
def inform():
print('made a worker!')
pool = mp.Pool(2, maxtasksperchild=2, initializer=inform)
res= pool.map(ret_x, range(8))
print(res)
将正确地给予:
made a worker!
made a worker!
worker exiting after 2 tasks
worker exiting after 2 tasks
made a worker!
worker exiting after 2 tasks
made a worker!
worker exiting after 2 tasks
[0, 1, 2, 3, 4, 5, 6, 7]
但是,将range
更改为20
并不会显示正在创建的任何其他工作人员或总共20个已完成的任务,即使在预期列表中返回完成的范围也是如此。
made a worker!
made a worker!
worker exiting after 2 tasks
worker exiting after 2 tasks
made a worker!
worker exiting after 2 tasks
made a worker!
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
worker exiting after 1 tasks
答案 0 :(得分:1)
它的工作原理是因为你没有明确定义" chunksize"在pool.map中:
map(func, iterable[, chunksize])
这个方法将迭代器切成了许多块 作为单独的任务提交到进程池。 (近似)大小 可以通过将chunksize设置为正数来指定这些块 整数
来源:https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.pool
对于8个项目,考虑到len(池)= 2,chunksize将为1(divmod(8,2 * 4)),因此您看到(8/1)/ 2个工人= 4个工人
workers = (len of items / chunksize) / tasks per process
对于20个项目,考虑len(池)= 2,chunksize将为3(divmode(20,2 * 4)),所以你看到类似(20/3)/ 2 = 3.3工人的东西
对于40 ... chunksize = 5,工人=(40/5)/ 5 = 4名工人
如果需要,可以设置chunksize = 1
res = pool.map(ret_x, range(40), 1)
你会看到(20/1)/ 2 = 10名工人
python mppp.py
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
所以chunksize就像进程的单位工作量......或类似的东西。
如何计算chunksize:https://hg.python.org/cpython/file/1c54def5947c/Lib/multiprocessing/pool.py#l305