我的数据框的格式为:
category | value |
cat a |x |
cat a |x |
cat a |y |
cat b |w |
cat b |z |
我希望能够返回类似的信息(显示唯一的值和频率)
category | freq of most common value |most common value |
cat a 2 x
cat b 1 w #(it doesnt matter if here is an w or z)
答案 0 :(得分:2)
一种方法是同时对两列都进行groupby
并取size
,对值进行排序并采用较高的频率:
(df.groupby(['category', 'value'])
.value.size()
.sort_values()
.groupby(level=0)
.tail(1))
category value
cat b z 1
cat a x 2
Name: value, dtype: int64
答案 1 :(得分:2)
在lambda函数中,每组将Series.value_counts
与Series.head
一起使用:
df = (df.groupby('category', sort=False)['value']
.apply(lambda x: x.value_counts().head(1))
.reset_index()
.rename(columns={'level_1':'most_common_value','value':'freq of most common value'}))
print (df)
category most_common_value freq of most common value
0 cat a x 2
1 cat b w 1
答案 2 :(得分:1)
这是使用crosstab
的解决方案:
m = pd.crosstab(df['category'],df['value'])
m = m.max(1).to_frame('freq of most common value').assign(most_common_value=m.idxmax(1))
print(m)
freq of most common value most_common_value
category
cat a 2 x
cat b 1 w