使用Pandas,获取以下数据集
author1,category1,10.00
author1,category2,15.00
author1,category3,12.00
author2,category1,5.00
author2,category2,6.00
author2,category3,4.00
author2,category4,9.00
author3,category1,7.00
author3,category2,4.00
author3,category3,7.00
我想为每位作者获得最高价值
author1,category2,15.00
author2,category4,9.00
author3,category1,7.00
author3,category3,7.00
(道歉,我是一只大熊猫。)
答案 0 :(得分:5)
import pandas as pd
df = pd.read_csv("in.csv", names=("Author","Cat","Val"))
print(df.groupby(['Author'])['Val'].max())
获取df:
inds = df.groupby(['Author'])['Val'].transform(max) == df['Val']
df = df[inds]
df.reset_index(drop=True, inplace=True)
print(df)
Author Cat Val
0 author1 category2 15
1 author2 category4 9
2 author3 category1 7
3 author3 category3 7
答案 1 :(得分:2)
由于您还要检索category
列,因此.agg
列上的标准val
无法满足您的需求。 (另外,由于author3中有两个值为7,因此@Padraic Cunningham使用.max()
的方法只返回一个实例而不是两个实例。您可以定义一个自定义的apply
函数来完成任务。
import pandas as pd
# your data, assume columns names are: author, cat, val
# ===============================
print(df)
author cat val
0 author1 category1 10
1 author1 category2 15
2 author1 category3 12
3 author2 category1 5
4 author2 category2 6
5 author2 category3 4
6 author2 category4 9
7 author3 category1 7
8 author3 category2 4
9 author3 category3 7
# processing
# ====================================
def func(group):
return group.loc[group['val'] == group['val'].max()]
df.groupby('author', as_index=False).apply(func).reset_index(drop=True)
author cat val
0 author1 category2 15
1 author2 category4 9
2 author3 category1 7
3 author3 category3 7