我有这个数据框:
对于每个IMEI,我想检查12小时内是否发生了随后的DATETIME OF LVD
。如果这样做,则需要将其删除。
例如,在此df行中,需要删除1,6,13,14,15行。
IMEI DATETIME OF LVD
0 864811031001402 2018-10-04 23:50:00
1 864811031001402 2018-10-05 04:35:00
2 864811031001402 2018-10-15 03:40:00
3 864811031001402 2018-10-21 04:25:00
4 866710038341548 2018-10-27 05:53:00
5 864811031092336 2018-10-17 18:10:00
6 864811031092336 2018-10-17 18:41:00
7 864811031092336 2018-10-21 04:50:00
8 864811031092336 2018-10-23 03:21:00
9 864811031092336 2018-10-24 03:00:00
10 864811031009041 2018-10-13 21:52:00
11 864811031009041 2018-10-27 11:13:00
12 864811031015584 2018-10-27 00:48:00
13 864811031015584 2018-10-28 05:25:00
14 864811031015584 2018-10-28 05:26:00
15 864811031015584 2018-10-28 05:27:00
我可以获取每条记录(以下)的增量时间差,但是如何为每个IMEI组做到这一点?
df['Delta'] = pd.to_datetime(df['DATETIME OF LVD']).diff()
IMEI DATETIME OF LVD Delta
0 864811031001402 2018-10-04 23:50:00 NaT
1 864811031001402 2018-10-05 04:35:00 0 days 04:45:00
2 864811031001402 2018-10-15 03:40:00 9 days 23:05:00
3 864811031001402 2018-10-21 04:25:00 6 days 00:45:00
4 866710038341548 2018-10-27 05:53:00 6 days 01:28:00
5 864811031092336 2018-10-17 18:10:00 -10 days +12:17:00
6 864811031092336 2018-10-17 18:41:00 0 days 00:31:00
7 864811031092336 2018-10-21 04:50:00 3 days 10:09:00
8 864811031092336 2018-10-23 03:21:00 1 days 22:31:00
9 864811031092336 2018-10-24 03:00:00 0 days 23:39:00
10 864811031009041 2018-10-13 21:52:00 -11 days +18:52:00
11 864811031009041 2018-10-27 11:13:00 13 days 13:21:00
12 864811031015584 2018-10-27 00:48:00 -1 days +13:35:00
13 864811031015584 2018-10-28 05:25:00 1 days 04:37:00
14 864811031015584 2018-10-28 05:26:00 0 days 00:01:00
15 864811031015584 2018-10-28 05:27:00 0 days 00:01:00
答案 0 :(得分:0)
使用DataFrameGroupBy.diff
并使用带有2个布尔值掩码的boolean indexing
进行过滤-将Timedelta与按|
的检查丢失行链接在一起以进行逐位OR
:
df['DATETIME OF LVD'] = pd.to_datetime(df['DATETIME OF LVD'])
s = df.groupby('IMEI')['DATETIME OF LVD'].diff()
df = df[(s > pd.Timedelta('12 hour')) | s.isna()]
print (df)
IMEI DATETIME OF LVD
0 864811031001402 2018-10-04 23:50:00
2 864811031001402 2018-10-15 03:40:00
3 864811031001402 2018-10-21 04:25:00
4 866710038341548 2018-10-27 05:53:00
5 864811031092336 2018-10-17 18:10:00
7 864811031092336 2018-10-21 04:50:00
8 864811031092336 2018-10-23 03:21:00
9 864811031092336 2018-10-24 03:00:00
10 864811031009041 2018-10-13 21:52:00
11 864811031009041 2018-10-27 11:13:00
12 864811031015584 2018-10-27 00:48:00
13 864811031015584 2018-10-28 05:25:00
详细信息:
print (s)
0 NaT
1 0 days 04:45:00
2 9 days 23:05:00
3 6 days 00:45:00
4 NaT
5 NaT
6 0 days 00:31:00
7 3 days 10:09:00
8 1 days 22:31:00
9 0 days 23:39:00
10 NaT
11 13 days 13:21:00
12 NaT
13 1 days 04:37:00
14 0 days 00:01:00
15 0 days 00:01:00
Name: DATETIME OF LVD, dtype: timedelta64[ns]