从这样的数据开始:
np.random.seed(314)
df = pd.DataFrame({
'date':[pd.date_range('2016-04-01', '2016-04-05')[r] for r in np.random.randint(0,5,20)],
'cat':['ABCD'[r] for r in np.random.randint(0,4,20)],
'count': np.random.randint(0,100,20)
})
cat count date
0 B 87 2016-04-04
1 A 95 2016-04-05
2 D 89 2016-04-02
3 D 39 2016-04-05
4 A 39 2016-04-01
5 C 61 2016-04-05
6 C 58 2016-04-04
7 B 49 2016-04-03
8 D 20 2016-04-02
9 B 54 2016-04-01
10 B 87 2016-04-01
11 D 36 2016-04-05
12 C 13 2016-04-05
13 A 79 2016-04-04
14 B 91 2016-04-03
15 C 83 2016-04-05
16 C 85 2016-04-05
17 D 93 2016-04-01
18 C 85 2016-04-02
19 B 91 2016-04-03
我只想以count
是相应cat
中最大值的行结尾:
cat count date
1 A 95 2016-04-05
14 B 91 2016-04-03
16 C 85 2016-04-05
17 D 93 2016-04-01
18 C 85 2016-04-02
19 B 91 2016-04-03
请注意,可以是多个记录,每个类别的最大计数
答案 0 :(得分:2)
使用transform
df[df['count']==df.groupby('cat')['count'].transform('max')]
Out[163]:
cat count date
1 A 95 2016-04-05
14 B 91 2016-04-03
16 C 85 2016-04-05
17 D 93 2016-04-01
18 C 85 2016-04-02
19 B 91 2016-04-03