如何获取python pandas中每列的n个最频繁或最高值?

时间:2020-10-20 12:47:49

标签: python pandas dataframe

我的数据框如下:

df:
    A   B
0   a   g
1   f   g
2   a   g
3   a   d
4   h   d
5   f   a

对于每列前2个最频繁的值(n = 2),输出应为:

top_df:
    A   B
0   a   g
1   f   d

谢谢

3 个答案:

答案 0 :(得分:1)

这应该有效

n = 2
df.apply(lambda x: pd.Series(x.value_counts().index[:n]))

答案 1 :(得分:0)

this之类的东西可能会帮助

maxes = dict()
for col in df.columns:
    frequencies = df[col].value_counts()
    # value counts automatically sorts, so just take the first 2
    max[col] = frequencies[:2]

答案 2 :(得分:0)

解决方案:
要获取n的最频繁值,只需子集.value_counts()并获取索引:

import pandas as pd

df = pd.read_csv('test.csv')

# METHOD 1 : Lil lengthy and inefficient
top_dict = {}
n_freq_items = 2
top_dict['A'] = df.A.value_counts()[:n_freq_items].index.tolist()
top_dict['B'] = df.B.value_counts()[:n_freq_items].index.tolist()
top_df = pd.DataFrame(top_dict)

print(top_df)
df.apply(lambda x: pd.Series(x.value_counts()[:n_freq_items].index))

# METHOD 2 : Small, and better : taking this method from @myccha. As I found this better
top_df = df.apply(lambda x: pd.Series(x.value_counts()[:n_freq_items].index))
print(top_df)

输入数据:

# test.csv
A,B
a,g
f,g
a,g
a,d
h,d
f,a

输出:

   A  B
0  a  g
1  f  d

注意::我从 @myccha 那里获得了解决方案,这是该帖子的另一个答案,因为我发现他的答案更有帮助,因此将其添加为方法2。