Question

我正在如下计算工具的VWAP（成交量加权平均价）：

def vwap(df):
    df = df[['Price', 'Volume']].dropna()
    denominator = (df.Price * df.Volume).sum()
    numerator = df.Volume.sum()
    if denominator == 0:
        return np.nan
    return denominator/numerator

df.resample('10T', label='right').apply(vwap)

对于跨越几年的分钟数据，这大约需要5分钟。我一直在研究Dask和Swifter，但似乎都没有为'DatetimeIndexResampler'对象提供并行化，因为我认为这些方法必须提前知道如何对数据进行分区。

有人对如何加快这一过程有直觉吗？

请注意，原始df包含零星的数据，具体取决于交易时间，因此大约10分钟的间隔可能不包含任何数据，而其他间隔可能包含几行。

因此，如果我这样做：undl.resample('10T', label='right').swifter.apply(last_vwap)

我知道

<ipython-input-162-c879f56543fb> in <module>
----> 1 undl.resample('10T', label='right').swifter.apply(last_vwap)

> ~/.local/lib/python3.6/site-packages/pandas/core/resample.py in __getattr__(self, attr)
     95             return self[attr]
     96 
---> 97         return object.__getattribute__(self, attr)
     98 
     99     @property

> AttributeError: 'DatetimeIndexResampler' object has no attribute 'swifter'

但是如果我使用dask将其转换为数据帧，由于每10分钟间隔的滴答数不一致，因此我不知道要使用多少个分区。

加快熊猫VWAP

0 个答案: