计算pandas中事件之间的时差

时间:2014-08-14 05:18:16

标签: python datetime pandas

+--------------------------------------------------------------+
| 2014-08-12T10:30:14.6938893+10:00     Reading received START |
| 2014-08-12T10:30:14.6938893+10:00       Reading received ADD |
| 2014-08-12T10:30:14.7094893+10:00    Reading received UPDATE |
| 2014-08-12T10:30:14.7094893+10:00    Reading received COMMIT |
| 2014-08-12T10:30:14.7094893+10:00               Commit start |
| 2014-08-12T10:30:14.7406893+10:00                 Commit end |
| 2014-08-12T10:30:14.7406893+10:00    Reading received FINISH |
| 2014-08-12T10:30:23.3206893+10:00     Reading received START |
| 2014-08-12T10:30:23.3206893+10:00       Reading received ADD |
| 2014-08-12T10:30:23.3362893+10:00    Reading received UPDATE |
| 2014-08-12T10:30:23.3362893+10:00    Reading received COMMIT |
| 2014-08-12T10:30:23.3362893+10:00               Commit start |
| 2014-08-12T10:30:23.3674893+10:00                 Commit end |
| 2014-08-12T10:30:23.3674893+10:00    Reading received FINISH |
+--------------------------------------------------------------+

给定值描述事件的时间序列,如何计算重复事件之间的增量时间,例如: 阅读收到START 和随后阅读收到FINISH 之间的平均差异?

有没有比这更好的方法,例如。

left = df[df.Event == 'Reading received START']
right = df[df.Event == 'Reading received FINISH']
left.index = range(len(left))
right.index = range(len(right))
delta = (right.Time - left.Time)

1 个答案:

答案 0 :(得分:1)

为明确起见,我假设您正在从较大的数据框中显示索引和一列(称为“#39;事件'”)。那是对的吗? 如下:

relevant_df = df[df.Event.isin(['Reading received START','Reading received START'])
relevant_ts_as_series = pd.Series(relevant_df.index)
diff = relevant_ts_as_series - relevant_ts_as_series.shift()

如果您愿意,可以diff.mean()

我敢打赌,除了将索引转换为系列之外,还有一种更优雅的方式,但这应该适合你。