Pandas Dataframe:在12小时内使用DateTime删除行

时间:2018-12-12 14:39:55

标签: python pandas dataframe

我有这个数据框:

对于每个IMEI,我想检查12小时内是否发生了随后的DATETIME OF LVD。如果这样做,则需要将其删除。

例如,在此df行中,需要删除1,6,13,14,15行。

               IMEI      DATETIME OF LVD  
0   864811031001402  2018-10-04 23:50:00         
1   864811031001402  2018-10-05 04:35:00         
2   864811031001402  2018-10-15 03:40:00       
3   864811031001402  2018-10-21 04:25:00        
4   866710038341548  2018-10-27 05:53:00       
5   864811031092336  2018-10-17 18:10:00         
6   864811031092336  2018-10-17 18:41:00       
7   864811031092336  2018-10-21 04:50:00          
8   864811031092336  2018-10-23 03:21:00         
9   864811031092336  2018-10-24 03:00:00        
10  864811031009041  2018-10-13 21:52:00       
11  864811031009041  2018-10-27 11:13:00       
12  864811031015584  2018-10-27 00:48:00        
13  864811031015584  2018-10-28 05:25:00        
14  864811031015584  2018-10-28 05:26:00        
15  864811031015584  2018-10-28 05:27:00   

我可以获取每条记录(以下)的增量时间差,但是如何为每个IMEI组做到这一点?

df['Delta'] = pd.to_datetime(df['DATETIME OF LVD']).diff()

               IMEI      DATETIME OF LVD              Delta
0   864811031001402  2018-10-04 23:50:00                NaT
1   864811031001402  2018-10-05 04:35:00    0 days 04:45:00
2   864811031001402  2018-10-15 03:40:00    9 days 23:05:00
3   864811031001402  2018-10-21 04:25:00    6 days 00:45:00
4   866710038341548  2018-10-27 05:53:00    6 days 01:28:00
5   864811031092336  2018-10-17 18:10:00 -10 days +12:17:00
6   864811031092336  2018-10-17 18:41:00    0 days 00:31:00
7   864811031092336  2018-10-21 04:50:00    3 days 10:09:00
8   864811031092336  2018-10-23 03:21:00    1 days 22:31:00
9   864811031092336  2018-10-24 03:00:00    0 days 23:39:00
10  864811031009041  2018-10-13 21:52:00 -11 days +18:52:00
11  864811031009041  2018-10-27 11:13:00   13 days 13:21:00
12  864811031015584  2018-10-27 00:48:00  -1 days +13:35:00
13  864811031015584  2018-10-28 05:25:00    1 days 04:37:00
14  864811031015584  2018-10-28 05:26:00    0 days 00:01:00
15  864811031015584  2018-10-28 05:27:00    0 days 00:01:00

1 个答案:

答案 0 :(得分:0)

使用DataFrameGroupBy.diff并使用带有2个布尔值掩码的boolean indexing进行过滤-将Timedelta与按|的检查丢失行链接在一起以进行逐位OR

df['DATETIME OF LVD'] = pd.to_datetime(df['DATETIME OF LVD'])

s = df.groupby('IMEI')['DATETIME OF LVD'].diff()
df = df[(s > pd.Timedelta('12 hour')) | s.isna()]
print (df)
               IMEI     DATETIME OF LVD
0   864811031001402 2018-10-04 23:50:00
2   864811031001402 2018-10-15 03:40:00
3   864811031001402 2018-10-21 04:25:00
4   866710038341548 2018-10-27 05:53:00
5   864811031092336 2018-10-17 18:10:00
7   864811031092336 2018-10-21 04:50:00
8   864811031092336 2018-10-23 03:21:00
9   864811031092336 2018-10-24 03:00:00
10  864811031009041 2018-10-13 21:52:00
11  864811031009041 2018-10-27 11:13:00
12  864811031015584 2018-10-27 00:48:00
13  864811031015584 2018-10-28 05:25:00

详细信息

print (s)
0                 NaT
1     0 days 04:45:00
2     9 days 23:05:00
3     6 days 00:45:00
4                 NaT
5                 NaT
6     0 days 00:31:00
7     3 days 10:09:00
8     1 days 22:31:00
9     0 days 23:39:00
10                NaT
11   13 days 13:21:00
12                NaT
13    1 days 04:37:00
14    0 days 00:01:00
15    0 days 00:01:00
Name: DATETIME OF LVD, dtype: timedelta64[ns]