Question

假设我有4个numpy阵列A，B，C，D，每个大小为（256,256,1792）。我想通过这些数组的每个元素并对它做一些事情，但是我需要在256x256x256-cubes的块中进行。

我的代码如下所示：

for l in range(7): 
    x, y, z, t = 0,0,0,0
    for m in range(a.shape[0]):
        for n in range(a.shape[1]):
            for o in range(256*l,256*(l+1)):
                t += D[m,n,o] * constant
                x += A[m,n,o] * D[m,n,o] * constant
                y += B[m,n,o] * D[m,n,o] * constant
                z += C[m,n,o] * D[m,n,o] * constant
    final = (x+y+z)/t
    doOutput(final)

代码完全按照我想要的方式工作和输出，但速度非常慢。我在网上看到python中应避免使用那种嵌套for循环。什么是最干净的解决方案？（现在我正在尝试用C语言编写这部分代码，并以某种方式通过Cython或其他工具导入它，但我喜欢纯粹的python解决方案）

由于

添加

Willem Van Onsem 对第一部分的解决方案似乎工作得很好，我想我理解它。但现在我想在总结它们之前修改我的值。它看起来像

（在外部l循环内）

for m in range(a.shape[0]):
    for n in range(a.shape[1]):
        for o in range(256*l,256*(l+1)):
            R += (D[m,n,o] * constant * (A[m,n,o]**2 
            + B[m,n,o]**2 + C[m,n,o]**2)/t - final**2)
doOutput(R)

我显然不能将总和x = (A[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*Dsub).sum()**2*constant平方，因为（A²+B²）！=（A + B）² 我怎样才能重做这最后的循环？

Answer 1

由于您使用t，m in range(a.shape[0])和n in range(a.shape[1])的每个元素更新o in range(256*l,256*(l+1))，您可以替换：

for m in range(a.shape[0]):
    for n in range(a.shape[1]):
        for o in range(256*l,256*(l+1)):
            t += D[m,n,o]

使用：

t += D[:a.shape[0],:a.shape[1],256*l:256*(l+1)].sum()

其他作业也一样。因此，您可以将代码重写为：

for l in range(7): 
    Dsub = D[:a.shape[0],:a.shape[1],256*l:256*(l+1)]
    x = (A[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*Dsub).sum()*constant
    y = (B[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*Dsub).sum()*constant
    z = (C[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*Dsub).sum()*constant
    t = Dsub.sum()*constant
   final = (x+y+z)/t
   doOutput(final)

请注意， numpy 中的*是元素乘法，不矩阵乘积。你可以在求和之前进行乘法运算，但由于乘法与常数之和等于该常数与总和的乘积，我认为从循环中做这个更有效。

如果a.shape[0]等于D.shape[0]等，您可以使用:代替:a.shape[0]。根据您的问题，似乎就是这种情况。这样：

# only when `a.shape[0] == D.shape[0], a.shape[1] == D.shape[1] (and so for A, B and C)`
for l in range(7): 
    Dsub = D[:,:,256*l:256*(l+1)]
    x = (A[:,:,256*l:256*(l+1)]*Dsub).sum()*constant
    y = (B[:,:,256*l:256*(l+1)]*Dsub).sum()*constant
    z = (C[:,:,256*l:256*(l+1)]*Dsub).sum()*constant
    t = Dsub.sum()*constant
    final = (x+y+z)/t
    doOutput(final)

处理.sum()级别的numpy会提高性能，因为您不会来回转换值，而.sum()使用紧循环

修改：

您的更新问题没有太大变化。你可以简单地使用：

m,n,_* = a.shape lo,hi = 256*l,256*(l+1) R = (D[:m,:n,lo:hi]*constant*(A[:m,:n,lo:hi]**2+B[:m,:n,lo:hi]**2+D[:m,:n,lo:hi]**2)/t-final**2)).sum() doOutput(R)

加速python中的嵌套for循环/通过numpy数组

1 个答案: