在没有循环的情况下对numpy数组中的分组项进行求和

时间:2017-12-05 03:13:36

标签: python numpy

我试图将数组a中的那些非零值与数组label中具有相同值的值相加,然后将它们替换为0,但只将其中一个替换为它们的总和:

import numpy as np
a =    np.array([[0,0,0,5,5,0],
                 [1,1,0,2,2,0],
                 [0,0,0,0,2,0],
                 [0,0,0,0,0,0],
                 [0,0,0,3,3,3]])

label = np.array([[0,1,2,3,3,3],
                  [1,1,4,4,4,3],
                  [1,4,4,5,4,6],
                  [1,4,4,4,7,8],
                  [9,5,5,5,5,5]])

#should produce the following result:
result =        [[0,0,0,0,0,10],
                 [2,0,0,0,6,0],
                 [0,0,0,0,0,0],
                 [0,0,0,0,0,0],
                 [0,0,0,0,9,0]]

在我们取代总和的地方并不重要。 除了循环之外,我无法想到任何其他方式。

a_ = a.ravel()
labels_ = labels.ravel()
list_of_labels = np.unique(label[a>0])

for item in list_of_labels:
     summ = np.sum(a_[np.argwhere((a_> 0) & (labels_ == item))])
     print summ

1 个答案:

答案 0 :(得分:1)

您可以使用np.bincount weights参数获取总和。如果我没有弄错np.bincount是O(n),下面代码的其余部分也是如此:

# get the sums
cnts = np.bincount(label.ravel(), a.ravel())
# next two lines get indices of the last occurrence of each label
psns = np.full(cnts.shape, -1, dtype=int)
psns[label.ravel()] = range(label.size)
# now plug the sums at the appropriate positions
resflat = np.zeros((a.size + 1,), dtype=a.dtype)
resflat[psns] = cnts
result = resflat[:-1].reshape(a.shape)
result
# array([[ 0,  0,  0,  0,  0,  0],
#        [ 0,  0,  0,  0,  0, 10],
#        [ 0,  0,  0,  0,  0,  0],
#        [ 2,  0,  0,  6,  0,  0],
#        [ 0,  0,  0,  0,  0,  9]])