Question

在Pandas中，据我所知，rolling_ *方法不包含将范围（在本例中为时间范围）指定为窗口/存储桶的方法。

我在这里看到了类似的问题：Pandas: rolling mean by time interval 我知道我可以重新采样数据，但这对于大型数据集来说并不理想，特别是如果窗口大小相对较小。此处的解决方案存在类似问题：pandas rolling computation with window based on values instead of counts和Compute EWMA over sparse/irregular TimeSeries in Pandas

想象一下，如果我想在VWAP上以较小的时间范围计算月份价值数据的体积加权平均价格（VWAP）。重新采样数据会导致在安静的市场周期中填充零行值的行，从而将数据集扩展为遗忘。

下面提供了一个小样本数据集（带代码）。

EventSystem

我可以通过执行类似

的操作轻松获得批量缩放价格

StandaloneInputModule

使用一些pandas滚动方法，如果我能够指定滚动TIME窗口（可能作为时间增量），则看起来像：

from StringIO import StringIO
from datetime import date, datetime, time
from pytz import timezone
import pandas as pd

s = """TIMESTAMP_DT,PRICE,QTY
2015-09-08 10:24:16.671862751+10:00,97.295,2
2015-09-08 10:25:33.952672310+10:00,97.3,4
2015-09-08 10:38:30.840283893+10:00,97.3,3
2015-09-08 11:00:47.536800660+10:00,97.305,1
2015-09-08 11:00:47.536896273+10:00,97.305,2
"""
SYD = timezone('Australia/Sydney')

df1 = pd.read_csv(StringIO(s), sep=',', index_col = 0)
df1.index =  pd.to_datetime(df1.index)
df1.index = df1.index.tz_localize('UTC').tz_convert(SYD)


                                      PRICE  QTY
TIMESTAMP_DT                                    
2015-09-08 10:24:16.671862751+10:00  97.295    2
2015-09-08 10:25:33.952672310+10:00  97.300    4
2015-09-08 10:38:30.840283893+10:00  97.300    3
2015-09-08 11:00:47.536800660+10:00  97.305    1
2015-09-08 11:00:47.536896273+10:00  97.305    2

有没有人知道实现滚动窗口的有效方法，指定时间段？

Answer 1

不确定你是否最终找到了解决方案，但我最近问了一个类似的问题。有人指出，pandas 0.19.0现在支持Time-aware Rolling。

我认为您应该可以使用以下内容在5分钟的窗口上执行滚动计算：

df1['VWAP'] = df1['Volume_Scaled_Price'].rolling('5min').sum() / df1['QTY'].rolling('5min').sum()

此外 - 这是当前支持的偏移别名列表。

http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases

窗口（bucketing）按时间滚动在熊猫中滚动

1 个答案: