沿着2D阵列中的列的Numpy(或scipy)频率计数

时间:2016-11-08 23:41:57

标签: python arrays numpy

我有一个像这样的2D数组

array([[ 1,  0, -1],
       [ 1,  1,  0],
       [-1,  0,  1],
       [ 0,  1,  0]])

我希望获得每列的最大频率值。对于上面的矩阵,我想得到[1,0,0](或[1,1,0],因为0和1在第二列中都出现两次。)

我看过numpy.unique,但它只需要一维数组。 bincount因为我的数组中有负数而无法工作。我还需要一个矢量化实现(因为我在矩阵中有数千行)。

2 个答案:

答案 0 :(得分:1)

您可以尝试以下操作:

import numpy as np
from collections import Counter

# Create your matrix
a = np.array([[ 1,  0, -1],
              [ 1,  1,  0],
              [-1,  0,  1],
              [ 0,  1,  0]])

# Loop on each column to get the most frequent element and its count
for i in range(a.shape[1]):
    count = Counter(a[:, i])
    count.most_common(1)

输出

[(1, 2)] # In first column : 1 appears most often (twice)
[(0, 2)] # In second column : 0 appears twice
[(0, 2)] # In third column : 0 appears twice also

答案 1 :(得分:1)

使用np.bincount促进负数有一个技巧:

>>> c = np.array([1,  1, -1,  0]) #array with negative number
>>> d = c - c.min() + 1 #make a fake array where minimum is 1, we know the offset to be c.min() - 1
>>> freq = np.bincount(d) # count frequency
>>> freq
array([0, 1, 1, 2]) #the output frequency array of the fake array, NOTE that each frequency is also the frequency of the original array shifted by c.min() - 1 positions
>>> np.argmax(freq) + c.min() - 1 #no add back the offsets since d was just a fake array
1

现在,有了这个技巧,你可以循环遍历每一列,找到最常见的元素。但是,诚然,这个解决方案没有矢量化。如  @Jesse Butterfield指出,另一篇文章使用scipy.stats.mode来处理这种情况,但它被批评为在具有大量独特元素的大矩阵上变慢。最佳方式可能最好留给经验轨迹。