我有一个这样的DF:
Name Gender Age Level
Pikachu Male 4 8
Charmander Female 5 7
Charmander Female 5 7
Squirtle Male 3 6
Squirtle Male 3 9
Squirtle Female 4 9
我希望它看起来像这样:
Name Gender Age Level
Pikachu Male 4 8
Charmander Female 5 7
Squirtle Male 3 9
Squirtle Female 4 9
我不知道该怎么用英语解释我要用伪代码写出来。
基本上:
If Name, Gender and Age are the same:
If there is a difference in levels:
Keep the row with higher level
If there is a tie:
Keep a random one
任何想法都值得赞赏!
答案 0 :(得分:3)
使用sort_values
+ drop_duplicates
进行确认
df=df.sort_values('Level').drop_duplicates(['Name','Gender','Age'],keep='last')
df
Name Gender Age Level
2 Charmander Female 5 7
0 Pikachu Male 4 8
4 Squirtle Male 3 9
5 Squirtle Female 4 9
答案 1 :(得分:2)
使用argsort
和duplicated
:
df[~df.iloc[np.argsort(-df.Level)].drop('Level', 1).duplicated()]
Name Gender Age Level
0 Pikachu Male 4 8
1 Charmander Female 5 7
4 Squirtle Male 3 9
5 Squirtle Female 4 9
groupby
+ idxmax
解决方案(尽管速度较慢):
df.iloc[df.groupby(['Name','Gender', 'Age']).Level.idxmax()]
Name Gender Age Level
1 Charmander Female 5 7
0 Pikachu Male 4 8
5 Squirtle Female 4 9
4 Squirtle Male 3 9