Question

我目前在Python代码中有以下双循环：

for i in range(a):
    for j in range(b):
        A[:,i]*=B[j][:,C[i,j]]

（A是浮点矩阵.B是浮点矩阵列表.C是整数矩阵。矩阵是指m x n np.arrays。

准确地说，大小为：A：mxa B：b大小为mxl的矩阵（每个矩阵的l不同）C：axb。这里m非常大，a非常大，b很小，l甚至比b小）

我尝试通过

加快速度

for j in range(b):
    A[:,:]*=B[j][:,C[:,j]]

但令我惊讶的是，这表现得更糟。

更准确地说，这确实提高了m和a（“大”数）的小值的性能，但是从m = 7000，a = 700以后，第一个appraoch的速度大约是其两倍。

我还能做些什么吗？

也许我可以并行化？但我真的不知道如何。

（我不承诺使用Python 2或3）

Answer 1

这是一个矢量化方法，假设B为具有相同形状的数组列表 -

# Convert B to a 3D array
B_arr = np.asarray(B)

# Use advanced indexing to index into the last axis of B array with C
# and then do product-reduction along the second axis. 
# Finally, we perform elementwise multiplication with A
A *= B_arr[np.arange(B_arr.shape[0]),:,C].prod(1).T

对于a较小的情况，我们可以运行循环，遍历a的长度。此外，为了获得更高的性能，最好将这些元素存储到一个单独的2D数组中，并在我们离开循环后仅执行一次元素乘法。

因此，我们会有一个像这样的替代实现 -

range_arr = np.arange(B_arr.shape[0])
out = np.empty_like(A)
for i in range(a):
    out[:,i] = B_arr[range_arr,:,C[i,:]].prod(0)
A *= out

在numpy中加速循环加倍

1 个答案: