我正在尝试在python中的3D数组上使用最省时的累加。我尝试了numpy的cumsum,但是发现仅使用numba的手动并行化方法即可:
import numpy as np
from numba import njit, prange
from timeit import default_timer as timer
@njit(parallel=True)
def cpu_cumsum(data, output):
for i in prange(200):
for j in prange(2000000):
output[i,j][0] = data[i,j][0]
for i in prange(1, 200):
for j in prange(1,2000000):
for k in range(1, 5):
output[i,j,k] = data[i,j,k] + output[i,j,k-1]
return output
data = np.float32(np.arange(2000000000).reshape(200, 2000000, 5))
output = np.empty_like(data)
func_start = timer()
output = cpu_cumsum(data, output)
timing=timer()-func_start
print("Function: manualCumSum duration (seconds):" + str(timing))
My method:
Function: manualCumSum duration (seconds):2.8496341188924994
np.cumsum:
Function: cumSum duration (seconds):6.182090314569933
在使用guvectorize进行尝试时,我发现它为我的GPU使用了过多的内存,因此从那以后我就放弃了。有更好的方法来做到这一点,还是我走到了尽头? PS:由于需要多次循环,所以需要速度。