Question

我有一个像这样的数据集（数字）：

我想在所有列中获得最频繁的号码。

我尝试了

#through all the columns
for i in numeros[:16]:
print(numeros[i].value_counts().idxmax())

及其返回

1,7,12,5,8,17,14,9,20,2,6,4,14,2,21

但这只会返回每列上最频繁的数字，对吗？考虑到我所有的数据集，如何获得最多15个频繁的数字？

Answer 1

熊猫解决方案：

df = pd.DataFrame(np.random.randint(1,100,(9,15)))
df = df.stack().to_frame('key')
df['value'] = 1
df.groupby('key').count().sort_values(['value'],ascending=False).iloc[:15]

Answer 2

使用pd.Series.value_counts：

df = pd.DataFrame(np.random.randint(0, 100, (100, 100)))

res = pd.Series(df.values.flatten()).value_counts().head(15)

结果将是一系列计数，首先是最高计数，然后使用数据帧值进行索引。

Answer 3

使用collections.Counter及其most_common方法：

from collections import Counter

df = pd.DataFrame(np.random.randint(0, 100, (100, 100)))

res = pd.DataFrame.from_dict(Counter(df.values.flatten()).most_common(15))

print(res)

     0    1
0   64  126
1   72  119
2    1  116
3   14  115
4   28  114
5   67  113
6   16  113
7   56  113
8   84  112
9    3  112
10  19  112
11  13  111
12  94  110
13  52  110
14  66  109

数据帧中的数字频率

3 个答案: