从Pandas DataFrame中按计数删除行

时间:2013-11-09 23:51:45

标签: python pandas

使用以下DataFrame ...

                     line_date line_track  line_race  c1pos
 horse_name                                                
 Grand Cicero       2013-03-10         GP          9      9
 Clever Story       2013-09-13        BEL          7      7
 Distorted Dream    2013-10-04        BEL          4      2
 Distorted Dream    2013-09-13        BEL          7      5
 Distorted Dream    2013-04-27        BEL          6      2
 Distorted Dream    2012-10-24        BEL          4      2
 Distorted Dream    2012-09-12        BEL          2      3
 Distorted Dream    2012-06-30        BEL          8      4
 Distorted Dream    2012-06-09        BEL          2      4
 Mr. O'Leary        2013-10-13        BEL          5      5
 Mr. O'Leary        2013-08-29        SAR          7      6
 Mr. O'Leary        2013-05-27        BEL          6      5
 In the Dark        2013-10-13        BEL          5      7
 In the Dark        2013-09-22        BEL          5      7
 In the Dark        2013-08-03        SAR          2      7
 In the Dark        2012-11-24        AQU          3      7
 In the Dark        2012-10-18        BEL          6      6
 Bred to Boss       2013-10-26        PRX          3      5
 Bred to Boss       2013-10-06        PRX          6      3
 Bred to Boss       2012-08-18        SAR          4      1

...索引设置为horse_name。我需要将这些中的每一个“修剪”到一定数量。例如,“扭曲的梦”有七个记录。我需要将所有超过三个记录的所有记录减少到三个,因此它会生成一个类似下面的DataFrame。有没有快速简单的方法来做到这一点?

                     line_date line_track  line_race  c1pos
 horse_name                                                
 Grand Cicero       2013-03-10         GP          9      9
 Clever Story       2013-09-13        BEL          7      7
 Distorted Dream    2013-10-04        BEL          4      2
 Distorted Dream    2013-09-13        BEL          7      5
 Distorted Dream    2013-04-27        BEL          6      2
 Mr. O'Leary        2013-10-13        BEL          5      5
 Mr. O'Leary        2013-08-29        SAR          7      6
 Mr. O'Leary        2013-05-27        BEL          6      5
 In the Dark        2013-10-13        BEL          5      7
 In the Dark        2013-09-22        BEL          5      7
 In the Dark        2013-08-03        SAR          2      7
 Bred to Boss       2013-10-26        PRX          3      5
 Bred to Boss       2013-10-06        PRX          6      3
 Bred to Boss       2012-08-18        SAR          4      1

1 个答案:

答案 0 :(得分:1)

经常是groupby来救援!值得一读的是docs,因为有很多有用的技巧可供选择。

>>> df.groupby(level=0, sort=False, as_index=False).head(3)
                  line_date line_track  line_race  c1pos
horse_name                                              
Grand Cicero     2013-03-10         GP          9      9
Clever Story     2013-09-13        BEL          7      7
Distorted Dream  2013-10-04        BEL          4      2
Distorted Dream  2013-09-13        BEL          7      5
Distorted Dream  2013-04-27        BEL          6      2
Mr. O'Leary      2013-10-13        BEL          5      5
Mr. O'Leary      2013-08-29        SAR          7      6
Mr. O'Leary      2013-05-27        BEL          6      5
In the Dark      2013-10-13        BEL          5      7
In the Dark      2013-09-22        BEL          5      7
In the Dark      2013-08-03        SAR          2      7
Bred to Boss     2013-10-26        PRX          3      5
Bred to Boss     2013-10-06        PRX          6      3
Bred to Boss     2012-08-18        SAR          4      1

或者,如果你想要最后3:

>>> df.groupby(level=0, sort=False, as_index=False).tail(3)

sort=False只是为了保留原始的马匹顺序;如果你不关心它,你可以放弃它。)

您还可以对line_date列进行排序(更安全,首先将其转换为datetime,但YYYY-MM-DD字符串将按原样正确排序)并选择第一个或最后三个按时间顺序使用相同的head / tail方法。