通过相邻行的差异过滤pandas数据帧

时间:2017-10-24 08:25:40

标签: python python-3.x pandas datetime dataframe

我有一个由datetime索引的数据框。我想根据它们的索引和前一行的索引之间的差异来过滤掉行。

因此,如果我的标准是"删除比上一行晚了一个多小时的所有行",则应删除下面示例中的第二行:

2005-07-15 17:00:00  
2005-07-17 18:00:00  

在以下情况下,两行都保持不变:

2005-07-17 23:00:00  
2005-07-18 00:00:00 

1 个答案:

答案 0 :(得分:2)

您似乎需要boolean indexingdiff的差异,并与1 hour Timedelta进行比较:

dates=['2005-07-15 17:00:00','2005-07-17 18:00:00', '2005-07-17 19:00:00',  
      '2005-07-17 23:00:00', '2005-07-18 00:00:00']
df = pd.DataFrame({'a':range(5)}, index=pd.to_datetime(dates))

print (df)
                     a
2005-07-15 17:00:00  0
2005-07-17 18:00:00  1
2005-07-17 19:00:00  2
2005-07-17 23:00:00  3
2005-07-18 00:00:00  4
diff = df.index.to_series().diff().fillna(0)
print (diff)
2005-07-15 17:00:00   0 days 00:00:00
2005-07-17 18:00:00   2 days 01:00:00
2005-07-17 19:00:00   0 days 01:00:00
2005-07-17 23:00:00   0 days 04:00:00
2005-07-18 00:00:00   0 days 01:00:00
dtype: timedelta64[ns]

mask = diff <= pd.Timedelta(1, unit='h')
print (mask)
2005-07-15 17:00:00     True
2005-07-17 18:00:00    False
2005-07-17 19:00:00     True
2005-07-17 23:00:00    False
2005-07-18 00:00:00     True
dtype: bool

df = df[mask]
print (df)
                     a
2005-07-15 17:00:00  0
2005-07-17 19:00:00  2
2005-07-18 00:00:00  4