+--------------------------------------------------------------+
| 2014-08-12T10:30:14.6938893+10:00 Reading received START |
| 2014-08-12T10:30:14.6938893+10:00 Reading received ADD |
| 2014-08-12T10:30:14.7094893+10:00 Reading received UPDATE |
| 2014-08-12T10:30:14.7094893+10:00 Reading received COMMIT |
| 2014-08-12T10:30:14.7094893+10:00 Commit start |
| 2014-08-12T10:30:14.7406893+10:00 Commit end |
| 2014-08-12T10:30:14.7406893+10:00 Reading received FINISH |
| 2014-08-12T10:30:23.3206893+10:00 Reading received START |
| 2014-08-12T10:30:23.3206893+10:00 Reading received ADD |
| 2014-08-12T10:30:23.3362893+10:00 Reading received UPDATE |
| 2014-08-12T10:30:23.3362893+10:00 Reading received COMMIT |
| 2014-08-12T10:30:23.3362893+10:00 Commit start |
| 2014-08-12T10:30:23.3674893+10:00 Commit end |
| 2014-08-12T10:30:23.3674893+10:00 Reading received FINISH |
+--------------------------------------------------------------+
给定值描述事件的时间序列,如何计算重复事件之间的增量时间,例如: 阅读收到START 和随后阅读收到FINISH 之间的平均差异?
有没有比这更好的方法,例如。
left = df[df.Event == 'Reading received START']
right = df[df.Event == 'Reading received FINISH']
left.index = range(len(left))
right.index = range(len(right))
delta = (right.Time - left.Time)
答案 0 :(得分:1)
为明确起见,我假设您正在从较大的数据框中显示索引和一列(称为“#39;事件'”)。那是对的吗? 如下:
relevant_df = df[df.Event.isin(['Reading received START','Reading received START'])
relevant_ts_as_series = pd.Series(relevant_df.index)
diff = relevant_ts_as_series - relevant_ts_as_series.shift()
如果您愿意,可以diff.mean()
。
我敢打赌,除了将索引转换为系列之外,还有一种更优雅的方式,但这应该适合你。