我有一个像这样的2D数组
array([[ 1, 0, -1],
[ 1, 1, 0],
[-1, 0, 1],
[ 0, 1, 0]])
我希望获得每列的最大频率值。对于上面的矩阵,我想得到[1,0,0](或[1,1,0],因为0和1在第二列中都出现两次。)
我看过numpy.unique,但它只需要一维数组。 bincount因为我的数组中有负数而无法工作。我还需要一个矢量化实现(因为我在矩阵中有数千行)。
答案 0 :(得分:1)
您可以尝试以下操作:
import numpy as np
from collections import Counter
# Create your matrix
a = np.array([[ 1, 0, -1],
[ 1, 1, 0],
[-1, 0, 1],
[ 0, 1, 0]])
# Loop on each column to get the most frequent element and its count
for i in range(a.shape[1]):
count = Counter(a[:, i])
count.most_common(1)
输出:
[(1, 2)] # In first column : 1 appears most often (twice)
[(0, 2)] # In second column : 0 appears twice
[(0, 2)] # In third column : 0 appears twice also
答案 1 :(得分:1)
使用np.bincount
促进负数有一个技巧:
>>> c = np.array([1, 1, -1, 0]) #array with negative number
>>> d = c - c.min() + 1 #make a fake array where minimum is 1, we know the offset to be c.min() - 1
>>> freq = np.bincount(d) # count frequency
>>> freq
array([0, 1, 1, 2]) #the output frequency array of the fake array, NOTE that each frequency is also the frequency of the original array shifted by c.min() - 1 positions
>>> np.argmax(freq) + c.min() - 1 #no add back the offsets since d was just a fake array
1
现在,有了这个技巧,你可以循环遍历每一列,找到最常见的元素。但是,诚然,这个解决方案没有矢量化。如
@Jesse Butterfield指出,另一篇文章使用scipy.stats.mode
来处理这种情况,但它被批评为在具有大量独特元素的大矩阵上变慢。最佳方式可能最好留给经验轨迹。