使用以下DataFrame ...
line_date line_track line_race c1pos
horse_name
Grand Cicero 2013-03-10 GP 9 9
Clever Story 2013-09-13 BEL 7 7
Distorted Dream 2013-10-04 BEL 4 2
Distorted Dream 2013-09-13 BEL 7 5
Distorted Dream 2013-04-27 BEL 6 2
Distorted Dream 2012-10-24 BEL 4 2
Distorted Dream 2012-09-12 BEL 2 3
Distorted Dream 2012-06-30 BEL 8 4
Distorted Dream 2012-06-09 BEL 2 4
Mr. O'Leary 2013-10-13 BEL 5 5
Mr. O'Leary 2013-08-29 SAR 7 6
Mr. O'Leary 2013-05-27 BEL 6 5
In the Dark 2013-10-13 BEL 5 7
In the Dark 2013-09-22 BEL 5 7
In the Dark 2013-08-03 SAR 2 7
In the Dark 2012-11-24 AQU 3 7
In the Dark 2012-10-18 BEL 6 6
Bred to Boss 2013-10-26 PRX 3 5
Bred to Boss 2013-10-06 PRX 6 3
Bred to Boss 2012-08-18 SAR 4 1
...索引设置为horse_name
。我需要将这些中的每一个“修剪”到一定数量。例如,“扭曲的梦”有七个记录。我需要将所有超过三个记录的所有记录减少到三个,因此它会生成一个类似下面的DataFrame。有没有快速简单的方法来做到这一点?
line_date line_track line_race c1pos
horse_name
Grand Cicero 2013-03-10 GP 9 9
Clever Story 2013-09-13 BEL 7 7
Distorted Dream 2013-10-04 BEL 4 2
Distorted Dream 2013-09-13 BEL 7 5
Distorted Dream 2013-04-27 BEL 6 2
Mr. O'Leary 2013-10-13 BEL 5 5
Mr. O'Leary 2013-08-29 SAR 7 6
Mr. O'Leary 2013-05-27 BEL 6 5
In the Dark 2013-10-13 BEL 5 7
In the Dark 2013-09-22 BEL 5 7
In the Dark 2013-08-03 SAR 2 7
Bred to Boss 2013-10-26 PRX 3 5
Bred to Boss 2013-10-06 PRX 6 3
Bred to Boss 2012-08-18 SAR 4 1
答案 0 :(得分:1)
经常是groupby
来救援!值得一读的是docs,因为有很多有用的技巧可供选择。
>>> df.groupby(level=0, sort=False, as_index=False).head(3)
line_date line_track line_race c1pos
horse_name
Grand Cicero 2013-03-10 GP 9 9
Clever Story 2013-09-13 BEL 7 7
Distorted Dream 2013-10-04 BEL 4 2
Distorted Dream 2013-09-13 BEL 7 5
Distorted Dream 2013-04-27 BEL 6 2
Mr. O'Leary 2013-10-13 BEL 5 5
Mr. O'Leary 2013-08-29 SAR 7 6
Mr. O'Leary 2013-05-27 BEL 6 5
In the Dark 2013-10-13 BEL 5 7
In the Dark 2013-09-22 BEL 5 7
In the Dark 2013-08-03 SAR 2 7
Bred to Boss 2013-10-26 PRX 3 5
Bred to Boss 2013-10-06 PRX 6 3
Bred to Boss 2012-08-18 SAR 4 1
或者,如果你想要最后3:
>>> df.groupby(level=0, sort=False, as_index=False).tail(3)
(sort=False
只是为了保留原始的马匹顺序;如果你不关心它,你可以放弃它。)
您还可以对line_date
列进行排序(更安全,首先将其转换为datetime
,但YYYY-MM-DD
字符串将按原样正确排序)并选择第一个或最后三个按时间顺序使用相同的head
/ tail
方法。