池映射未使用所有可用资源的可能原因

时间:2019-11-13 08:05:49

标签: python multiprocessing

我正在运行以下代码

from multiprocessing import Pool


def loop_f(x, num_loops):
    for i in range(num_loops):
        f(x)
    return 

def f(x):
    result = 0 
    for i in range(x):
        result = result*i
    return result

x = 200000
num_times=200
for i in range(8):
    p = Pool(i +1)
    print(i+1)
    %time res=p.map(f, [x]*num_times)

现在,当我运行此代码时,我看到性能改进在第四个过程之后停止了

Timing when using  1  processes
CPU times: user 9.08 ms, sys: 13.4 ms, total: 22.5 ms
Wall time: 1.17 s
Timing when using  2  processes
CPU times: user 0 ns, sys: 12.1 ms, total: 12.1 ms
Wall time: 598 ms
Timing when using  3  processes
CPU times: user 5.51 ms, sys: 5.6 ms, total: 11.1 ms
Wall time: 467 ms
Timing when using  4  processes
CPU times: user 9.1 ms, sys: 479 µs, total: 9.58 ms
Wall time: 348 ms
Timing when using  5  processes
CPU times: user 4.15 ms, sys: 4.51 ms, total: 8.66 ms
Wall time: 352 ms
Timing when using  6  processes
CPU times: user 6.85 ms, sys: 2.74 ms, total: 9.59 ms
Wall time: 343 ms
Timing when using  7  processes
CPU times: user 2.79 ms, sys: 7.16 ms, total: 9.95 ms
Wall time: 349 ms
Timing when using  8  processes
CPU times: user 9.06 ms, sys: 427 µs, total: 9.49 ms
Wall time: 362 ms

但是,当我检查系统时,我应该可以使用8个处理器内核。

import multiprocessing
import os

print(multiprocessing.cpu_count())
print(len(os.sched_getaffinity(0)))
8
8

那么正在发生或可能发生的事情是什么?如何最大化系统性能?

3 个答案:

答案 0 :(得分:2)

您只能创建一个池。

from multiprocessing import Pool

def f(x):
    j = 0
    for i in range(1000000):
        j += i

    return x*x

if __name__ == '__main__':
    with Pool(8) as p:
        print(p.map(f, range(1000)))

以上内容使我的八个线程忙了一会儿。

答案 1 :(得分:1)

我的机器实际上只有4个核心: https://ark.intel.com/content/www/us/en/ark/products/75056/intel-xeon-processor-e3-1270-v3-8m-cache-3-50-ghz.html

import multiprocessing
import os

print(multiprocessing.cpu_count())
print(len(os.sched_getaffinity(0)))

不报告内核数,仅报告线程数

答案 2 :(得分:0)

multiprocessing.Pool()用于声明您希望您的进程在其上运行的内核数。 Pool的不同方法告诉您如何在这些进程上应用多重处理。

代码中的第一个问题是,每次使用不同数量的内核初始化池时。第二个是一旦您的工作人员完成了这些流程,就应该加入它们。

我重写代码:

from multiprocessing import Pool
import multiprocessing as mp

def f(x):
    j = 0
    for i in range(1000000):
        j += i
    return x*x

if __name__=='__main__':
    p=Pool(mp.cpu_count()) #Declaring the Pools with the number of cpus your machine has
    res=p.map_async(f, range(1000))

    p.close() #close the pool
    p.join() #join all the workers
    print(res.get())