Python:多进程工作者,完成跟踪任务(缺少完成)

时间:2015-01-23 00:38:01

标签: python multiprocessing

默认的multiprocessing.Pool代码包含一个计数器,用于跟踪工作人员完成的任务数量:

    completed += 1
logging.debug('worker exiting after %d tasks' % completed)

但是从range(12)上升到range(20) a pool.map会导致计数器出错(这似乎与工作人员创建无关)。我不清楚导致这种情况的原因。

例如:

import multiprocessing as mp

def ret_x(x): 
    return x
def inform():
    print('made a worker!')
pool  = mp.Pool(2, maxtasksperchild=2, initializer=inform)
res= pool.map(ret_x, range(8))
print(res)

将正确地给予:

made a worker!
made a worker!
worker exiting after 2 tasks
worker exiting after 2 tasks
made a worker!
worker exiting after 2 tasks
made a worker!
worker exiting after 2 tasks
[0, 1, 2, 3, 4, 5, 6, 7]

但是,将range更改为20并不会显示正在创建的任何其他工作人员或总共20个已完成的任务,即使在预期列表中返回完成的范围也是如此。

made a worker!
made a worker!
worker exiting after 2 tasks
worker exiting after 2 tasks
made a worker!
worker exiting after 2 tasks
made a worker!
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
worker exiting after 1 tasks

1 个答案:

答案 0 :(得分:1)

它的工作原理是因为你没有明确定义" chunksize"在pool.map中:

map(func, iterable[, chunksize])
  

这个方法将迭代器切成了许多块   作为单独的任务提交到进程池。 (近似)大小   可以通过将chunksize设置为正数来指定这些块   整数

来源:https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.pool

对于8个项目,考虑到len(池)= 2,chunksize将为1(divmod(8,2 * 4)),因此您看到(8/1)/ 2个工人= 4个工人

workers = (len of items / chunksize) /  tasks per process

对于20个项目,考虑len(池)= 2,chunksize将为3(divmode(20,2 * 4)),所以你看到类似(20/3)/ 2 = 3.3工人的东西

对于40 ... chunksize = 5,工人=(40/5)/ 5 = 4名工人

如果需要,可以设置chunksize = 1

res = pool.map(ret_x, range(40), 1)

你会看到(20/1)/ 2 = 10名工人

python mppp.py
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

所以chunksize就像进程的单位工作量......或类似的东西。

如何计算chunksize:https://hg.python.org/cpython/file/1c54def5947c/Lib/multiprocessing/pool.py#l305