我有一个简单的测试程序,在下面的python 3.6.3中编写和执行。它是在具有4个内核的计算机上执行的。
import multiprocessing
import time
def f(num):
print(multiprocessing.current_process(), num)
time.sleep(1)
if (num % 2):
raise Exception
pool = multiprocessing.Pool(5)
try:
pool.map(f, range(1,20))
except Exception as e:
print("EXCEPTION")
pool.close()
pool.join()
输出为pool = multiprocessing.Pool(5)
:
<ForkProcess(ForkPoolWorker-1, started daemon)> 1
<ForkProcess(ForkPoolWorker-2, started daemon)> 2
<ForkProcess(ForkPoolWorker-3, started daemon)> 3
<ForkProcess(ForkPoolWorker-4, started daemon)> 4
<ForkProcess(ForkPoolWorker-5, started daemon)> 5
<ForkProcess(ForkPoolWorker-2, started daemon)> 6
<ForkProcess(ForkPoolWorker-1, started daemon)> 7
<ForkProcess(ForkPoolWorker-4, started daemon)> 8
<ForkProcess(ForkPoolWorker-3, started daemon)> 9
<ForkProcess(ForkPoolWorker-5, started daemon)> 10
<ForkProcess(ForkPoolWorker-2, started daemon)> 11
<ForkProcess(ForkPoolWorker-1, started daemon)> 12
<ForkProcess(ForkPoolWorker-4, started daemon)> 13
<ForkProcess(ForkPoolWorker-3, started daemon)> 14
<ForkProcess(ForkPoolWorker-5, started daemon)> 15
<ForkProcess(ForkPoolWorker-2, started daemon)> 16
<ForkProcess(ForkPoolWorker-1, started daemon)> 17
<ForkProcess(ForkPoolWorker-3, started daemon)> 18
<ForkProcess(ForkPoolWorker-4, started daemon)> 19
EXCEPTION
但是,如果我将池的进程数更改为等于或小于我计算机上的内核数,则不会打印对f()
所在的num
的每次调用。
输出为pool = multiprocessing.Pool(4)
:
<ForkProcess(ForkPoolWorker-1, started daemon)> 1
<ForkProcess(ForkPoolWorker-2, started daemon)> 3
<ForkProcess(ForkPoolWorker-3, started daemon)> 5
<ForkProcess(ForkPoolWorker-2, started daemon)> 7
<ForkProcess(ForkPoolWorker-1, started daemon)> 9
<ForkProcess(ForkPoolWorker-3, started daemon)> 11
<ForkProcess(ForkPoolWorker-3, started daemon)> 13
<ForkProcess(ForkPoolWorker-1, started daemon)> 15
<ForkProcess(ForkPoolWorker-2, started daemon)> 17
<ForkProcess(ForkPoolWorker-1, started daemon)> 19
EXCEPTION
我不明白为什么这些进程会被杀死,尤其是在函数中的print语句之后甚至没有引发异常的情况下。我真的不明白为什么只有在池中的进程数等于或小于计算机上的内核数时才会发生。
答案 0 :(得分:2)
参考multiprocessing.Pool.map
的规范
您会看到一个可选参数chunksize
,如果将其指定为1,即pool.map(f, range(1,20), 1)
,则会产生预期的结果。
如果增加块大小(例如= 6),您可能会看到:
<SpawnProcess(SpawnPoolWorker-1, started daemon)> 1
<SpawnProcess(SpawnPoolWorker-4, started daemon)> 7
<SpawnProcess(SpawnPoolWorker-3, started daemon)> 13
<SpawnProcess(SpawnPoolWorker-2, started daemon)> 19
这建议将chunksize
个任务分配给Pool中的单个线程,当您在每个线程中引发异常时,当然,其余卡盘中的任务将不会执行。
从这里您可以知道chunksize
的默认值是2,这种变量存在的原因(很容易看出)是减少了需要删除的新线程数。初始化(当您具有适当的块大小时,可以节省资源和处理时间)。