Question

我在Python中编写了一些代码很好但速度很慢的代码;我认为由于for循环。我希望可以使用numpy命令加快以下操作。让我来定义目标。

假设我有一个尺寸为all_CMs x row的2D numpy数组col。例如，考虑一个6 x 11数组（见下图）。

我想计算所有行的平均值，即 sum ⱼaᵢⱼ导致数组。这当然可以轻松完成。（我将此值称为CM_tilde）
现在，对于每一行，我想计算一些选定值的平均值，即通过计算它们的总和并将其除以所有列的数量来计算低于某个阈值的所有值（ N）。如果该值高于此定义的阈值，则添加CM_tilde值（整行的平均值）。此值称为CM
然后，从行中的每个元素中减去CM值

除此之外，我想要一个numpy数组或列表，列出所有这些CM值。

图：

以下代码正在运行，但速度很慢（特别是如果数组变大）

CM_tilde = np.mean(data, axis=1)
N = data.shape[1]
data_cm = np.zeros(( data.shape[0], data.shape[1], data.shape[2] ))
all_CMs = np.zeros(( data.shape[0], data.shape[2]))
for frame in range(data.shape[2]):
    for row in range(data.shape[0]):
        CM=0
        for col in range(data.shape[1]):
            if data[row, col, frame] < (CM_tilde[row, frame]+threshold):
               CM += data[row, col, frame]
            else:
               CM += CM_tilde[row, frame]
        CM = CM/N
        all_CMs[row, frame] = CM
        # calculate CM corrected value
        for col in range(data.shape[1]):
            data_cm[row, col, frame] = data[row, col, frame] - CM
    print "frame: ", frame
return data_cm, all_CMs

有什么想法吗？

Answer 1

很容易想象你正在做的事情：

import numpy as np

#generate dummy data
nrows=6
ncols=11
nframes=3
threshold=0.3
data=np.random.rand(nrows,ncols,nframes)

CM_tilde = np.mean(data, axis=1)
N = data.shape[1]

all_CMs2 = np.mean(np.where(data < (CM_tilde[:,None,:]+threshold),data,CM_tilde[:,None,:]),axis=1)
data_cm2 = data - all_CMs2[:,None,:]

将其与原件进行比较：

In [684]: (data_cm==data_cm2).all()
Out[684]: True

In [685]: (all_CMs==all_CMs2).all()
Out[685]: True

逻辑是我们同时处理大小为[nrows,ncols,nframes]的数组。主要技巧是通过将大小为CM_tilde的{{1}}转换为大小为[nrows,nframes]的{{1}}来利用python的广播。然后，Python将为每列使用相同的值，因为这是此修改后的CM_tilde[:,None,:]的单个维度。

使用[nrows,1,nframes]我们选择（基于CM_tilde）是否要获取np.where的相应值，或者再次使用threshold的广播值。 data的新用法允许我们计算CM_tilde。

在最后一步中，我们通过直接从np.mean的相应元素中减去这个新all_CMs2来使用广播。

通过查看临时变量的隐式索引，这可能有助于以这种方式向量化代码。我的意思是你的临时变量all_CMs2存在于data的循环内，并且每次迭代都会重置其值。这意味着CM实际上是一个数量[nrows,nframes]（后来明确分配给二维数组CM），从这里很容易看出你可以通过在其列维度上总结适当的CM[row,frame]数量。如果有帮助，您可以为此目的将all_CMs部分命名为CMtmp[row,col,frames]，然后从中计算np.where(...)。显然，结果相同，但可能更透明。

Answer 2

这是我的函数矢量化。我从内到外工作，并在我进行时评论了早期版本。因此，我向量化的第一个循环具有###注释标记。

它不像@Andras's那样干净且理智，但希望它是教学性的，让我们知道如何逐步解决这个问题。

def foo2(data, threshold):
    CM_tilde = np.mean(data, axis=1)
    N = data.shape[1]
    #data_cm = np.zeros(( data.shape[0], data.shape[1], data.shape[2] ))
    ##all_CMs = np.zeros(( data.shape[0], data.shape[2]))
    bmask = data < (CM_tilde[:,None,:] + threshold)
    CM = np.zeros_like(data)
    CM[:] = CM_tilde[:,None,:]
    CM[bmask] = data[bmask]
    CM = CM.sum(axis=1)
    CM = CM/N
    all_CMs = CM.copy()
    """
    for frame in range(data.shape[2]):
        for row in range(data.shape[0]):
            ###print(frame, row)
            ###mask = data[row, :, frame] < (CM_tilde[row, frame]+threshold)
            ###print(mask)
            ##mask = bmask[row,:,frame]
            ##CM = data[row, mask, frame].sum()
            ##CM += (CM_tilde[row, frame]*(~mask)).sum()

            ##CM = CM/N
            ##all_CMs[row, frame] = CM
            ## calculate CM corrected value
            #for col in range(data.shape[1]):
            #    data_cm[row, col, frame] = data[row, col, frame] - CM[row,frame]
        print "frame: ", frame
    """
    data_cm = data - CM[:,None,:]
    return data_cm, all_CMs

这个小测试用例的输出匹配，这比任何帮助我获得正确尺寸的更多。

threshold = .1
data = np.arange(4*3*2,dtype=float).reshape(4,3,2)

numpy vectorization而不是for循环

2 个答案: