关于python多处理时序的困惑

时间:2016-11-17 10:59:43

标签: python python-multiprocessing

我正在尝试使用多处理池进行简单的并行处理。这是代码。

from math import sqrt
from timeit import default_timer as timer
from multiprocessing import Pool as WorkerPool
def squareNumber(n):
    return sqrt(n ** 2)

def calculateSeq(numbers):
    results = map(squareNumber, numbers)
    return results

def calculatePar(numbers, threads=4):
    pool = WorkerPool(processes=threads)
    results = pool.map(squareNumber, numbers)
    pool.close()
    pool.join()
    return results

if __name__ == "__main__":
    N = int(10e6)
    numbers = range(N)

    start = timer()
    resultSeq = calculateSeq(numbers)
    end = timer()
    seqtime = (end-start)

    start = timer()
    resultPar = calculatePar(numbers, 4)
    end = timer()
    partime = (end-start)

    print "sequential time : ", seqtime*1e3, " ms"
    print "parallel time : ", partime*1e3, " ms"

    assert resultPar == resultSeq

在我的2核(4线程)机器上,我实现了以下时间:

sequential time :  2532.51695633  ms
parallel time :  2983.89601707  ms

基于this,池产生独立的进程,我可以检查它。所以这不是GIL问题。平行版本较慢的原因是什么?

我还使用Process实现了几乎相同的计算量,如下所示:

from multiprocessing import Process
from math import sqrt
from timeit import default_timer as timer

def worker(n):
    out = [ sqrt(i ** 2) for i in range(n)]
    print sum(out)

def parworker(N):
    jobs = []
    nThreads = 4
    NPerThread = N/nThreads

    for i in range(nThreads):
        p = Process(target=worker, args=(NPerThread,))
        jobs.append(p)
        p.start()

    for p in jobs:
        p.join()

我使用相同的输入大小和相同的计时功能对其进行基准测试,以获得以下执行时间:

sequential time :  2339.83802795  ms
parallel time :  968.728780746  ms

在这种情况下,我确实得到了加速(~2.7x),这对我的机器来说很有意义。 Process和Pool都使用了进程级别的并行性,为什么我不能在第一种情况下获得加速?或者我做错了什么?

0 个答案:

没有答案