我正在尝试使用多处理池进行简单的并行处理。这是代码。
from math import sqrt
from timeit import default_timer as timer
from multiprocessing import Pool as WorkerPool
def squareNumber(n):
return sqrt(n ** 2)
def calculateSeq(numbers):
results = map(squareNumber, numbers)
return results
def calculatePar(numbers, threads=4):
pool = WorkerPool(processes=threads)
results = pool.map(squareNumber, numbers)
pool.close()
pool.join()
return results
if __name__ == "__main__":
N = int(10e6)
numbers = range(N)
start = timer()
resultSeq = calculateSeq(numbers)
end = timer()
seqtime = (end-start)
start = timer()
resultPar = calculatePar(numbers, 4)
end = timer()
partime = (end-start)
print "sequential time : ", seqtime*1e3, " ms"
print "parallel time : ", partime*1e3, " ms"
assert resultPar == resultSeq
在我的2核(4线程)机器上,我实现了以下时间:
sequential time : 2532.51695633 ms
parallel time : 2983.89601707 ms
基于this,池产生独立的进程,我可以检查它。所以这不是GIL问题。平行版本较慢的原因是什么?
我还使用Process实现了几乎相同的计算量,如下所示:
from multiprocessing import Process
from math import sqrt
from timeit import default_timer as timer
def worker(n):
out = [ sqrt(i ** 2) for i in range(n)]
print sum(out)
def parworker(N):
jobs = []
nThreads = 4
NPerThread = N/nThreads
for i in range(nThreads):
p = Process(target=worker, args=(NPerThread,))
jobs.append(p)
p.start()
for p in jobs:
p.join()
我使用相同的输入大小和相同的计时功能对其进行基准测试,以获得以下执行时间:
sequential time : 2339.83802795 ms
parallel time : 968.728780746 ms
在这种情况下,我确实得到了加速(~2.7x),这对我的机器来说很有意义。 Process和Pool都使用了进程级别的并行性,为什么我不能在第一种情况下获得加速?或者我做错了什么?