Question

我有一个像这样的2D数组

array([[ 1,  0, -1],
       [ 1,  1,  0],
       [-1,  0,  1],
       [ 0,  1,  0]])

我希望获得每列的最大频率值。对于上面的矩阵，我想得到[1,0,0]（或[1,1,0]，因为0和1在第二列中都出现两次。）

我看过numpy.unique，但它只需要一维数组。 bincount因为我的数组中有负数而无法工作。我还需要一个矢量化实现（因为我在矩阵中有数千行）。

Answer 1

您可以尝试以下操作：

import numpy as np
from collections import Counter

# Create your matrix
a = np.array([[ 1,  0, -1],
              [ 1,  1,  0],
              [-1,  0,  1],
              [ 0,  1,  0]])

# Loop on each column to get the most frequent element and its count
for i in range(a.shape[1]):
    count = Counter(a[:, i])
    count.most_common(1)

输出：

[(1, 2)] # In first column : 1 appears most often (twice)
[(0, 2)] # In second column : 0 appears twice
[(0, 2)] # In third column : 0 appears twice also

Answer 2

使用np.bincount促进负数有一个技巧：

>>> c = np.array([1,  1, -1,  0]) #array with negative number
>>> d = c - c.min() + 1 #make a fake array where minimum is 1, we know the offset to be c.min() - 1
>>> freq = np.bincount(d) # count frequency
>>> freq
array([0, 1, 1, 2]) #the output frequency array of the fake array, NOTE that each frequency is also the frequency of the original array shifted by c.min() - 1 positions
>>> np.argmax(freq) + c.min() - 1 #no add back the offsets since d was just a fake array
1

现在，有了这个技巧，你可以循环遍历每一列，找到最常见的元素。但是，诚然，这个解决方案没有矢量化。如 @Jesse Butterfield指出，另一篇文章使用scipy.stats.mode来处理这种情况，但它被批评为在具有大量独特元素的大矩阵上变慢。最佳方式可能最好留给经验轨迹。

沿着2D阵列中的列的Numpy（或scipy）频率计数

2 个答案: