块大小与Python中的多处理/ pool.map无关吗?

时间:2018-11-14 18:49:55

标签: python multithreading multiprocessing python-multiprocessing python-multithreading

我尝试利用python的池多处理功能。

独立于我如何设置块大小(在Windows 7和Ubuntu下-后者在下面有4个内核),并行线程的数量似乎保持不变。

from multiprocessing import Pool
from multiprocessing import cpu_count
import multiprocessing
import time


def f(x):
    print("ready to sleep", x, multiprocessing.current_process())
    time.sleep(20)
    print("slept with:", x, multiprocessing.current_process())


if __name__ == '__main__':
    processes = cpu_count()
    print('-' * 20)
    print('Utilizing %d cores' % processes)
    print('-' * 20)
    pool = Pool(processes)
    myList = []
    runner = 0
    while runner < 40:
        myList.append(runner)
        runner += 1
    print("len(myList):", len(myList))

    # chunksize = int(len(myList) / processes)
    # chunksize = processes
    chunksize = 1
    print("chunksize:", chunksize)
    pool.map(f, myList, 1)

无论我使用chunksize = int(len(myList) / processes)chunksize = processes还是1,行为都是相同的(如上例所示)。

是否可以将块大小自动设置为核心数量?

chunksize = 1的示例:

--------------------
Utilizing 4 cores
--------------------
len(myList): 40
chunksize: 10
ready to sleep 0 <ForkProcess(ForkPoolWorker-1, started daemon)>
ready to sleep 1 <ForkProcess(ForkPoolWorker-2, started daemon)>
ready to sleep 2 <ForkProcess(ForkPoolWorker-3, started daemon)>
ready to sleep 3 <ForkProcess(ForkPoolWorker-4, started daemon)>
slept with: 0 <ForkProcess(ForkPoolWorker-1, started daemon)>
ready to sleep 4 <ForkProcess(ForkPoolWorker-1, started daemon)>
slept with: 1 <ForkProcess(ForkPoolWorker-2, started daemon)>
ready to sleep 5 <ForkProcess(ForkPoolWorker-2, started daemon)>
slept with: 2 <ForkProcess(ForkPoolWorker-3, started daemon)>
ready to sleep 6 <ForkProcess(ForkPoolWorker-3, started daemon)>
slept with: 3 <ForkProcess(ForkPoolWorker-4, started daemon)>
ready to sleep 7 <ForkProcess(ForkPoolWorker-4, started daemon)>
slept with: 4 <ForkProcess(ForkPoolWorker-1, started daemon)>
ready to sleep 8 <ForkProcess(ForkPoolWorker-1, started daemon)>
slept with: 5 <ForkProcess(ForkPoolWorker-2, started daemon)>
ready to sleep 9 <ForkProcess(ForkPoolWorker-2, started daemon)>
slept with: 6 <ForkProcess(ForkPoolWorker-3, started daemon)>
ready to sleep 10 <ForkProcess(ForkPoolWorker-3, started daemon)>
slept with: 7 <ForkProcess(ForkPoolWorker-4, started daemon)>
ready to sleep 11 <ForkProcess(ForkPoolWorker-4, started daemon)>
slept with: 8 <ForkProcess(ForkPoolWorker-1, started daemon)>

1 个答案:

答案 0 :(得分:5)

Chunksize不会影响正在使用的内核数,这是由processes的{​​{1}}参数设置的。 Chunksize设置您传递给Pool的可迭代项的数量,每个Pool.map称为“任务”的单个工作进程一次分配(下图显示了Python 3.7.1)。

task_python_3.7.1

如果设置了Pool,则只有在完成之前收到的工作后,才能在新任务中为工作进程提供新项目。对于chunksize=1,工人在任务中一次获得整批物料,完成后将剩下下一批。

使用chunksize > 1来逐项分发项目增加了调度的灵活性,同时降低了总体吞吐量,因为滴灌需要更多的进程间通信(IPC)。

在对Pool的块大小算法here进行深入分析时,我将用于处理可迭代项目的一个项目的工作单元定义为 taskel ,以避免命名与Pool使用“任务”一词的命名冲突。一项任务(作为工作单元)由chunksize=1个任务组组成。

如果您无法预测任务需要完成多长时间(例如优化问题),则设置chunksize,在该问题中,任务之间的处理时间差异很大。此处滴灌可防止工人流程坐在一堆未接触的物品上,而在一个沉重的任务板上el缩时,可防止其任务中的其他物品分配给闲置的工人流程。

否则,如果所有任务组都需要相同的时间才能完成,则可以设置chunksize=1,这样任务就只能在所有工作进程中分配一次。请注意,如果chunksize=len(iterable) // processes有余数,这将产生比进程(进程+ 1)多的任务。这有可能严重影响您的总体计算时间。在先前链接的答案中了解有关此内容的更多信息。


仅供参考,这是源代码的一部分,len(iterable) / processes在内部设置未设置的块大小:

Pool