如果我需要知道indeces,那么在3d numpy数组上循环的最快方法是什么。
... some sort of loop ...
do something with each element that requires a knowledge of i,j,k.
E.g。
for i in range(N):
for j in range(N):
for k in range(N):
index = # something that depends on i,j,k
B[index] = A[i][j][k]**2
实际循环如下所示:
for z in range(Ngrid):
kz = 2*pi/LMAX*(z - Ngrid/2)
for y in range(Ngrid):
ky = 2*pi/LMAX*(y - Ngrid/2)
for x in range(Ngrid):
kx = 2*pi/LMAX*(x - Ngrid/2)
kk = sqrt(kx**2 + ky**2 + kz**2)
bind = int((kk - kmin)/dk)
if bind >= Nk:
continue
delk = delta_k[x][y][z]
Pk[bind] += (delk.real**2 + delk.imag**2)/2
Numk[bind] += 1
答案 0 :(得分:2)
解决问题的最快方法是,如果问题是 parallelizable / vectorizable ,我们可以访问NumPy工具,根本不循环。对于手头的问题,我们似乎可以对其进行矢量化。此问题与之前的Q&A
非常相似。所以,我们会从该帖子中借用一些东西,主要围绕着使用broadcasting
。
因此,我们会有一个解决方案,就像这样 -
KXYZ = 2*np.pi/LMAX*(np.arange(Ngrid) - Ngrid/2)
KK = np.sqrt(KXYZ[:,None,None]**2 + KXYZ[:,None]**2 + KXYZ**2)
BIND = ((KK - kmin)/dk).astype(int)
valid_mask = BIND<Nk
IDs = BIND[valid_mask]
vals = (delta_k.real[valid_mask]**2 + delta_k.imag[valid_mask]**2)/2
Pk += np.bincount( IDs, vals, minlength=len(Pk))
Numk += np.bincount( IDs, minlength=len(Numk))
运行时测试
方法 -
def loopy_app(Ngrid, LMAX, kmin, dk, Nk, delta_k):
Pk = np.zeros(Nk)
Numk = np.zeros(Nk)
for z in range(Ngrid):
kz = 2*np.pi/LMAX*(z - Ngrid/2)
for y in range(Ngrid):
ky = 2*np.pi/LMAX*(y - Ngrid/2)
for x in range(Ngrid):
kx = 2*np.pi/LMAX*(x - Ngrid/2)
kk = np.sqrt(kx**2 + ky**2 + kz**2)
bind = int((kk - kmin)/dk)
if bind >= Nk:
continue
delk = delta_k[x,y,z]
Pk[bind] += (delk.real**2 + delk.imag**2)/2
Numk[bind] += 1
return Pk, Numk
def vectorized_app(Ngrid, LMAX, kmin, dk, Nk, delta_k):
Pk = np.zeros(Nk)
Numk = np.zeros(Nk)
KXYZ = 2*np.pi/LMAX*(np.arange(Ngrid) - Ngrid/2)
KK = np.sqrt(KXYZ[:,None,None]**2 + KXYZ[:,None]**2 + KXYZ**2)
BIND = ((KK - kmin)/dk).astype(int)
valid_mask = BIND<Nk
IDs = BIND[valid_mask]
vals = (delta_k.real[valid_mask]**2 + delta_k.imag[valid_mask]**2)/2
Pk += np.bincount( IDs, vals, minlength=len(Pk))
Numk += np.bincount( IDs, minlength=len(Numk))
return Pk, Numk
输入设置:
# Setup inputs with random numbers
Ngrid = 100
LMAX = 3.45
kmin = 0.345
dk = 1.56
Nk = 80
delta_k = np.random.rand(Ngrid,Ngrid,Ngrid) + 1j * \
np.random.rand(Ngrid,Ngrid,Ngrid)
时间:
In [186]: app1_out1, app1_out2 = loopy_app(Ngrid, LMAX, kmin, dk, Nk, delta_k)
...: app2_out1, app2_out2 = vectorized_app(Ngrid, LMAX, kmin, dk, Nk, delta_k)
...: print np.allclose(app1_out1, app2_out1)
...: print np.allclose(app1_out2, app2_out2)
...:
True
True
In [187]: %timeit loopy_app(Ngrid, LMAX, kmin, dk, Nk, delta_k)
...: %timeit vectorized_app(Ngrid, LMAX, kmin, dk, Nk, delta_k)
...:
1 loops, best of 3: 2.61 s per loop
10 loops, best of 3: 20.7 ms per loop
In [188]: 2610/20.7
Out[188]: 126.08695652173914
在这些输入上看到 120x+
加速。