numba最有效的累积总和?

时间:2018-09-07 21:33:55

标签: python performance numba

我正在尝试在python中的3D数组上使用最省时的累加。我尝试了numpy的cumsum,但是发现仅使用numba的手动并行化方法即可:

import numpy as np
from numba import njit, prange
from timeit import default_timer as timer

@njit(parallel=True)
def cpu_cumsum(data, output):
    for i in prange(200):
        for j in prange(2000000):
            output[i,j][0] = data[i,j][0]

    for i in prange(1, 200):
        for j in prange(1,2000000):
            for k in range(1, 5):
                output[i,j,k] = data[i,j,k] + output[i,j,k-1]
    return output

data = np.float32(np.arange(2000000000).reshape(200, 2000000, 5))
output = np.empty_like(data)
func_start = timer()
output = cpu_cumsum(data, output)
timing=timer()-func_start
print("Function: manualCumSum duration (seconds):" + str(timing))

My method:
Function: manualCumSum duration (seconds):2.8496341188924994
np.cumsum:
Function: cumSum duration (seconds):6.182090314569933

在使用guvectorize进行尝试时,我发现它为我的GPU使用了过多的内存,因此从那以后我就放弃了。有更好的方法来做到这一点,还是我走到了尽头? PS:由于需要多次循环,所以需要速度。

0 个答案:

没有答案