为什么第一次运行Python多处理池时开销会更高? 与以下运行相比有何不同?
import pandas as pd
import time
import multiprocessing
def foo(n):
for i in range(n):
for j in range(n):
for k in range(n):
accum = i + j + k
return(accum)
def test1(pool, n):
pool.map(foo, [n, n])
def test2(n):
foo(n)
foo(n)
if __name__ == "__main__":
rtn = []
pool = multiprocessing.Pool(processes=2)
for n in range(100, 1100, 100):
startTime = time.time()
test1(pool, n)
t1 = time.time() - startTime
print('t1: {0} second'.format(time.time() - startTime))
startTime = time.time()
test2(n)
t2 = time.time() - startTime
print('t2: {0} second'.format(time.time() - startTime))
rtn.append([n, t1, t2])
xx = pd.DataFrame(rtn, columns=['n', 't1', 't2'])
print(xx)
n t1 t2
0 100 3.843944 0.106006 <-------- t1 is much longer than t2
1 200 0.640689 1.000097
2 300 2.526334 4.140915
3 400 6.880183 11.183931
4 500 14.937281 25.981793
5 600 27.315186 39.802715
6 700 41.263902 60.289115
7 800 64.577426 95.624465
8 900 90.760957 132.725434
9 1000 120.575304 177.576586
答案 0 :(得分:0)
这是因为必须首先创建池。因此,python必须告诉操作系统创建子进程(在您的示例中为2)。一旦这些进程运行起来,python就可以利用这些进程并处理提交给池的任务。顺便说一句,您应该在完成后close
池中。
我喜欢三回路CPU压力源!希望我能解决您的问题!