Pandas:丢弃值之前的行,包括来自另一列的值

时间:2018-04-30 15:42:44

标签: python pandas

我想丢弃Time 0之前的每一行,其中ID包含与0 - 值相同的0。包含 Time Author ID Date 2018-04-23 08:09:52.558 60 1744025 44 2018-04-23 14:26:12.294 360 1244021 10 2018-04-23 15:19:47.667 45 1244021 10 2018-04-23 18:05:25.417 240 1249997 19 2018-04-23 18:58:20.776 180 2185555 19 2018-04-23 18:59:50.883 120 2185555 19 2018-04-23 19:29:30.500 300 1686620 19 2018-04-24 00:23:45.673 0 1249997 19 2018-04-24 06:55:29.529 10 1244021 10 2018-04-24 14:08:19.080 270 1686620 19 2018-04-24 17:58:30.757 120 1416825 39 2018-04-24 19:33:41.127 600 1249997 19 的行也将被删除。

数据如下:

                          Time  Author         ID
Date
2018-04-23 08:09:52.558    60  1744025         44
2018-04-23 14:26:12.294   360  1244021         10
2018-04-23 15:19:47.667    45  1244021         10
2018-04-24 06:55:29.529    10  1244021         10
2018-04-24 14:08:19.080   270  1686620         19
2018-04-24 17:58:30.757   120  1416825         39
2018-04-24 19:33:41.127   600  1249997         19

我希望它是:

idxmax()

我摆弄了df[(df.Time == 0).idxmax():]

ID

但这不会考虑<quosure: global> ~just_an_example

那么我怎么能以最“pythonic”的方式做到这一点?

1 个答案:

答案 0 :(得分:1)

您可以在此处使用groupby + cumsum诀窍:

df[~df.Time.eq(0)[::-1].groupby(df.ID, sort=False).cumsum()]

                         Time   Author  ID
Date                                      
2018-04-23 08:09:52.558    60  1744025  44
2018-04-23 14:26:12.294   360  1244021  10
2018-04-23 15:19:47.667    45  1244021  10
2018-04-24 06:55:29.529    10  1244021  10
2018-04-24 14:08:19.080   270  1686620  19
2018-04-24 17:58:30.757   120  1416825  39
2018-04-24 19:33:41.127   600  1249997  19