我想将加权滚动平均值应用于大型时间序列,设置为大熊猫数据帧,其中每天的权重不同。这是数据帧的子集
DF:
Date v_std vertical
2010-10-01 1.909 545.231
2010-10-02 1.890 538.610
2010-10-03 1.887 542.759
2010-10-04 1.942 545.221
2010-10-05 1.847 536.832
2010-10-06 1.884 538.858
2010-10-07 1.864 538.017
2010-10-08 1.833 540.737
2010-10-09 1.847 537.906
2010-10-10 1.881 538.210
2010-10-11 1.868 544.238
2010-10-12 1.856 534.878
我想使用v_std作为权重来获取垂直列的滚动平均值。我一直在使用加权平均函数:
def wavg(group, avg_name, weight_name):
d = group[avg_name]
w = group[weight_name]
try:
return (d * w).sum() / w.sum()
except ZeroDivisionError:
return d.mean()
但我无法弄清楚如何实现滚动加权平均值。我认为它类似于
df.rolling(window = 7).apply(wavg, "vertical", "v_std")
还是使用rolling_apply?或者我是否必须一起编写新功能? 谢谢!
答案 0 :(得分:0)
我相信你可能正在寻找rolling()的win_type参数。您可以指定不同类型的窗口,例如“triang”(三角形)......
您可以查看https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html
处的参数答案 1 :(得分:0)
这就是我实施加权平均值的方法。如果对于这种事情有一个pairwise_apply会很好。
from pandas.core.window import _flex_binary_moment, _Rolling_and_Expanding
def weighted_mean(self, weights, **kwargs):
weights = self._shallow_copy(weights)
window = self._get_window(weights)
def _get_weighted_mean(X, Y):
X = X.astype('float64')
Y = Y.astype('float64')
sum_f = lambda x: x.rolling(window, self.min_periods, center=self.center).sum(**kwargs)
return sum_f(X * Y) / sum_f(Y)
return _flex_binary_moment(self._selected_obj, weights._selected_obj,
_get_weighted_mean, pairwise=True)
_Rolling_and_Expanding.weighted_mean = weighted_mean
df['mean'] = df['vertical'].rolling(window = 7).weighted_mean(df['v_std'])
答案 2 :(得分:0)
下面的代码应该做(对不起我的长命名约定)。这很简单(只是要利用新版本的Pandas的rolling.apply,它添加了raw = False,以允许传递比一维数组更多的信息):
def get_weighted_average(dataframe,window,columnname_data,columnname_weights):
processed_dataframe=dataframe.loc[:,(columnname_data,columnname_weights)].set_index(columnname_weights)
def get_mean_withweights(processed_dataframe_windowed):
return np.average(a=processed_dataframe_windowed,weights=processed_dataframe_windowed.index)
return processed_dataframe.rolling(window=window).apply(func=get_mean_withweights,raw=False)
答案 3 :(得分:0)
这是我使用熊猫_Rolling_and_Expanding
进行加权平均滚动的解决方案:
首先,我为乘法添加了新列:
df['mul'] = df['value'] * df['weight']
然后编写您要应用的功能:
from pandas.core.window import _Rolling_and_Expanding
def weighted_average(x):
d = []
d.append(x['mul'].sum()/x['weight'].sum())
return pd.Series(d, index=['wavg'])
_Rolling_and_Expanding.weighted_average = weighted_average
通过以下行应用该功能:
result = mean_per_group.rolling(window=7).weighted_average()
然后您可以通过以下方式获得想要的系列:
result['wavg']