假设数据帧如下:
id class count
0 A 2
0 B 2
0 C 2
0 D 1
1 A 3
1 B 3
1 E 2
2 D 4
2 F 2
对于每个id,我想找到计数最大的类。如果多个类具有相同的计数,请将它们合并为一行。对于上面的示例,结果应如下所示:
id class count
0 A,B,C 2
1 A,B 3
2 D 4
如何在pandas中使用语句来实现这个功能?
答案 0 :(得分:3)
df = df[g['count'].transform('max').eq(df['count'])]
print (df)
id class count
0 0 A 2
1 0 B 2
2 0 C 2
4 1 A 3
5 1 B 3
7 2 D 4
df = df.groupby('id').agg({'class':','.join, 'count':'first'}).reset_index()
print (df)
id class count
0 0 A,B,C 2
1 1 A,B 3
2 2 D 4
另一种具有自定义功能的解决方案:
def f(x):
x = x[x['count'] == x['count'].max()]
return (pd.Series([','.join(x['class'].values.tolist()), x['count'].iat[0]],
index=['class','count']))
df = df.groupby('id').apply(f).reset_index()
print (df)
id class count
0 0 A,B,C 2
1 1 A,B 3
2 2 D 4
答案 1 :(得分:3)
选项1
s = df.set_index(['id', 'class'])['count']
s1 = s[s.eq(s.groupby(level=0).max())].reset_index()
s1.groupby(
['id', 'count']
)['class'].apply(list).reset_index()[['id', 'class', 'count']]
id class count
0 0 [A, B, C] 2.0
1 1 [A, B] 3.0
2 2 [D] 4.0
选项2
d1 = df.set_index(['id', 'class'])['count'].unstack()
v = d1.values
m = np.nanmax(v, 1)
t = v == m[:, None]
pd.DataFrame({
'id': d1.index,
'class': [list(s) for s in t.dot(d1.columns)],
'count': m
})[['id', 'class', 'count']]
id class count
0 0 [A, B, C] 2.0
1 1 [A, B] 3.0
2 2 [D] 4.0