Question

重新构造我之前提出的一个不清楚的问题。我有一个如下的df。

import pandas as pd

data = {'Name':  ['Bill','Bill','Bill','John','John','Greg','Greg','Andy','Tom','Jeff'],
        'age_matches': [1, 0, 0, 0, 1, 0, 0, 0, 1, 0],
        'height_matches': [0, 0, 1, 1, 1, 0, 1, 1, 1, 1],
        'weight_matches' :[0, 1, 0, 1, 0, 1, 0, 0, 1, 0],
        'Score': [1, 1, 1, 2, 2, 1, 1, 1, 3, 1]
        }

df = pd.DataFrame (data, columns = ['Name','age_matches','height_matches','weight_matches','Score'])


Name    age_matches height_matches  weight_matches  Score
Bill           1          0              0            1
Bill           0          0              1            1
Bill           0          1              0            1
John           0          1              1            2
John           1          1              0            2
Greg           0          0              1            1
Greg           0          1              0            1
Andy           0          1              0            1
Tom            1          1              1            3
Jeff           0          1              0            1

我正在根据一系列特征（年龄，身高，体重）匹配一些观察结果。 1表示有比赛，0表示没有比赛。 Score是通过观察得出的所有匹配项的总和。 Age_matches在其他比赛中优先。如果在一个组（同名）中，我有一个情况（{age_matches == 1），那么我不想保留其他记录。另一方面，如果在一个组中没有age_matches == 1的实例，那么我可以保留所有记录。生成的df应该如下所示：

Name    age_matches height_matches  weight_matches  Score
Bill           1          0              0            1
John           1          1              0            2
Greg           0          0              1            1
Greg           0          1              0            1
Andy           0          1              0            1
Tom            1          1              1            3
Jeff           0          1              0            1

在“帐单”组中，我观察到age_matches == 1，因此可以删除其他记录。 “约翰”组中的情况相同。其余的都保留了。希望这足够清楚。关于如何实现这一目标的任何建议？谢谢

Answer 1

尝试一下：

Answer 2

我会做什么

df=df[df.groupby('Name').age_matches.transform('max')==df.age_matches]
   Name  age_matches  height_matches  weight_matches  Score
0  Bill            1               0               0      1
4  John            1               1               0      2
5  Greg            0               0               1      1
6  Greg            0               1               0      1
7  Andy            0               1               0      1
8   Tom            1               1               1      3
9  Jeff            0               1               0      1

删除Pandas df中不满足条件的重复行

2 个答案: