要确定使用大部分计算时间的步骤,我运行了cProfile并得到以下结果:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.014 0.014 216.025 216.025 func_poolasync.py:2(<module>)
11241 196.589 0.017 196.589 0.017 {method 'acquire' of 'thread.lock' objects}
982 0.010 0.000 196.532 0.200 threading.py:309(wait)
1000 0.002 0.000 196.498 0.196 pool.py:565(get)
1000 0.005 0.000 196.496 0.196 pool.py:557(wait)
515856/3987 0.350 0.000 13.434 0.003 artist.py:230(stale)
显然大部分时间都花在了method 'acquire' of 'thread.lock' objects
上。我没有使用线程;而不是我将pool.apply_async
与几个处理器一起使用,所以我很困惑为什么thread.lock
是问题?
我希望阐明为什么这是瓶颈?以及如何降低这次?
代码如下:
path='/usr/home/work'
filename='filename'
with open(path+filename+'/'+'result.pickle', 'rb') as f:
pdata = pickle.load(f)
if __name__ == '__main__':
pool = Pool()
results=[]
data=list(range(1000))
print('START')
start_time = int(round(time.time()))
result_objects = [pool.apply_async(func, args=(nomb,pdata[0],pdata[1],pdata[2])) for nomb in data]
results = [r.get() for r in result_objects]
pool.close()
pool.join()
print('END', int(round(time.time()))-start_time)
修订版:
通过从pool.apply_async
切换到pool.map
,我可以将执行时间减少约3倍。
输出:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.113 0.113 70.824 70.824 func.py:2(<module>)
4329 28.048 0.006 28.048 0.006 {method 'acquire' of 'thread.lock' objects}
4 0.000 0.000 28.045 7.011 threading.py:309(wait)
1 0.000 0.000 28.044 28.044 pool.py:248(map)
1 0.000 0.000 28.044 28.044 pool.py:565(get)
1 0.000 0.000 28.044 28.044 pool.py:557(wait)
修改代码:
if __name__ == '__main__':
pool = Pool()
data=list(range(1000))
print('START')
start_time = int(round(time.time()))
funct = partial(func,pdata[0],pdata[1],pdata[2])
results = pool.map(funct,data)
print('END', int(round(time.time()))-start_time)
但是,已经发现一些迭代会导致无意义的结果。我不确定为什么会这样,但是可以看到 速率确定步骤仍然是“ thread.lock”对象的“方法”获取。