大熊猫滚动最大分钟意味着

时间:2017-04-02 21:28:17

标签: python pandas time-series

对熊猫不熟悉我在数以万计的平滑随机数据生成示例中迷失了。

我一直在尝试实现的是使用带有滚动时间窗口的bokeh创建图形。我想要x轴(重新采样或其他)时间戳和3行显示maxminmean值,让我们说为duration滚动15秒的时间窗口领域。

欢乐在开始之前就停止了......我试图在不取得进展或学习的情况下应用很多例子。

以下代码

d2 = pd.read_csv(input_file, delimiter=",")
d2["ts_send"] = pd.to_datetime(d2["ts_send"], \ 
format="%Y-%m-%d %H:%M:%S.%f", exact=True, utc=True)

print (d2.head())
print (d2.rolling("15s", min_periods=1).mean().head())
print (d2.rolling("15s", min_periods=1).std().head())
print (d2.rolling("15s", min_periods=1).min().head())
print (d2.rolling("15s", min_periods=1).max().head())

产生例外:

  

ValueError:窗口必须是整数

如果我能让滚动的东西工作,我可能会管理bokeh方。

非常感谢任何支持实现这一目标的指示!

我在csv中有这些数据:

ts_send,endpoint,duration,
2017-01-19 09:03:28.600,/api/sig,1.0
2017-01-19 09:03:29.760,/api/sig,0.5
2017-01-19 09:04:51.210,/api/sig,0.508
2017-01-19 09:04:52.410,/api/sig,0.574
2017-01-19 09:09:32.854,/api/sig,1.0
2017-01-19 09:09:36.776,/api/sig,0.637
2017-01-19 09:14:14.207,/api/sig,0.672
2017-01-19 09:14:16.906,/api/sig,0.533
2017-01-19 11:49:34.939,/api/sig,1.0
2017-01-19 11:49:38.709,/api/sig,0.529
2017-01-19 12:19:01.668,/api/sig,1.0
2017-01-19 12:19:05.559,/api/item,0.169
2017-01-19 12:19:05.559,/api/item,0.102
2017-01-19 12:19:05.559,/api/item,0.44
2017-01-19 12:19:05.585,/api/item,0.173
2017-01-19 12:19:06.633,/api/sig,0.564
2017-01-19 12:27:05.712,/api/sig,0.574
2017-01-19 12:27:08.370,/api/sig,0.497
2017-01-19 12:27:43.319,/api/sig,0.561
2017-01-19 12:27:45.873,/api/sig,0.508
2017-01-19 12:46:15.454,/api/sig,1.0
2017-01-19 12:46:20.409,/api/item,0.173
2017-01-19 12:46:20.427,/api/item,0.163
2017-01-19 12:46:20.457,/api/item,0.169
2017-01-19 12:46:20.474,/api/item,0.162
2017-01-19 12:46:20.618,/api/item,0.209
2017-01-19 12:46:20.642,/api/item,0.172
2017-01-19 12:46:20.695,/api/item,0.26
2017-01-19 12:46:20.698,/api/item,0.193
2017-01-19 12:46:20.788,/api/item,0.193
2017-01-19 12:46:20.822,/api/item,0.232
2017-01-19 12:46:20.873,/api/item,0.164
2017-01-19 12:46:20.875,/api/item,0.142
2017-01-19 12:46:20.905,/api/item,0.356
2017-01-19 12:46:20.998,/api/item,0.199

时间戳ts_send是毫秒级的精确度。有时候没有记录事件,有时会在一毫秒内发生多个事件。

2 个答案:

答案 0 :(得分:0)

如果你的时间序列是索引,这将有效。在运行代码之前添加它:

d2.set_index('ts_send', inplace=True)

答案 1 :(得分:0)

感谢Boud和Goyo亲切的成员,我能够向前迈进。

代码生成我需要的东西:

d2 = pd.read_csv(input_file, delimiter=",")
d2["ts_send"] = pd.to_datetime(d2["ts_send"], format="%Y-%m-%d %H:%M:%S.%f", exact=True, utc=True)
d2.index = pd.DatetimeIndex(d2.ts_send, inplace=True)
d3 = d2.sort_index()
d3.drop(d3.columns[0],axis=1,inplace=True)

print (d3.index.is_monotonic_increasing)
print (d3.head())

print (d3.rolling("5s", min_periods=1).mean())
print (d3.rolling("5s", min_periods=1).std())
print (d3.rolling("5s", min_periods=1).min())
print (d3.rolling("5s", min_periods=1).max())