对熊猫不熟悉我在数以万计的平滑随机数据生成示例中迷失了。
我一直在尝试实现的是使用带有滚动时间窗口的bokeh
创建图形。我想要x轴(重新采样或其他)时间戳和3行显示max
,min
和mean
值,让我们说为duration
滚动15秒的时间窗口领域。
欢乐在开始之前就停止了......我试图在不取得进展或学习的情况下应用很多例子。
以下代码
d2 = pd.read_csv(input_file, delimiter=",")
d2["ts_send"] = pd.to_datetime(d2["ts_send"], \
format="%Y-%m-%d %H:%M:%S.%f", exact=True, utc=True)
print (d2.head())
print (d2.rolling("15s", min_periods=1).mean().head())
print (d2.rolling("15s", min_periods=1).std().head())
print (d2.rolling("15s", min_periods=1).min().head())
print (d2.rolling("15s", min_periods=1).max().head())
产生例外:
ValueError:窗口必须是整数
如果我能让滚动的东西工作,我可能会管理bokeh
方。
非常感谢任何支持实现这一目标的指示!
我在csv中有这些数据:
ts_send,endpoint,duration, 2017-01-19 09:03:28.600,/api/sig,1.0 2017-01-19 09:03:29.760,/api/sig,0.5 2017-01-19 09:04:51.210,/api/sig,0.508 2017-01-19 09:04:52.410,/api/sig,0.574 2017-01-19 09:09:32.854,/api/sig,1.0 2017-01-19 09:09:36.776,/api/sig,0.637 2017-01-19 09:14:14.207,/api/sig,0.672 2017-01-19 09:14:16.906,/api/sig,0.533 2017-01-19 11:49:34.939,/api/sig,1.0 2017-01-19 11:49:38.709,/api/sig,0.529 2017-01-19 12:19:01.668,/api/sig,1.0 2017-01-19 12:19:05.559,/api/item,0.169 2017-01-19 12:19:05.559,/api/item,0.102 2017-01-19 12:19:05.559,/api/item,0.44 2017-01-19 12:19:05.585,/api/item,0.173 2017-01-19 12:19:06.633,/api/sig,0.564 2017-01-19 12:27:05.712,/api/sig,0.574 2017-01-19 12:27:08.370,/api/sig,0.497 2017-01-19 12:27:43.319,/api/sig,0.561 2017-01-19 12:27:45.873,/api/sig,0.508 2017-01-19 12:46:15.454,/api/sig,1.0 2017-01-19 12:46:20.409,/api/item,0.173 2017-01-19 12:46:20.427,/api/item,0.163 2017-01-19 12:46:20.457,/api/item,0.169 2017-01-19 12:46:20.474,/api/item,0.162 2017-01-19 12:46:20.618,/api/item,0.209 2017-01-19 12:46:20.642,/api/item,0.172 2017-01-19 12:46:20.695,/api/item,0.26 2017-01-19 12:46:20.698,/api/item,0.193 2017-01-19 12:46:20.788,/api/item,0.193 2017-01-19 12:46:20.822,/api/item,0.232 2017-01-19 12:46:20.873,/api/item,0.164 2017-01-19 12:46:20.875,/api/item,0.142 2017-01-19 12:46:20.905,/api/item,0.356 2017-01-19 12:46:20.998,/api/item,0.199
时间戳ts_send
是毫秒级的精确度。有时候没有记录事件,有时会在一毫秒内发生多个事件。
答案 0 :(得分:0)
如果你的时间序列是索引,这将有效。在运行代码之前添加它:
d2.set_index('ts_send', inplace=True)
答案 1 :(得分:0)
感谢Boud和Goyo亲切的成员,我能够向前迈进。
代码生成我需要的东西:
d2 = pd.read_csv(input_file, delimiter=",")
d2["ts_send"] = pd.to_datetime(d2["ts_send"], format="%Y-%m-%d %H:%M:%S.%f", exact=True, utc=True)
d2.index = pd.DatetimeIndex(d2.ts_send, inplace=True)
d3 = d2.sort_index()
d3.drop(d3.columns[0],axis=1,inplace=True)
print (d3.index.is_monotonic_increasing)
print (d3.head())
print (d3.rolling("5s", min_periods=1).mean())
print (d3.rolling("5s", min_periods=1).std())
print (d3.rolling("5s", min_periods=1).min())
print (d3.rolling("5s", min_periods=1).max())