我想使用自定义数组来加权时间序列/ data.frame中的valus,如How do I calculate a rolling mean with custom weights in pandas?中的那样
import pandas as pd
ser = pd.Series([1,1,1], index=pd.date_range('1/1/2000', periods=3))
print ser
rm1 = pd.rolling_window(ser, window=[2,2,2], mean=False)
rm2 = pd.rolling_window(ser, window=[2,2,2]) #, mean=True
print rm1
#
#2000-01-01 NaN
#2000-01-02 NaN
#2000-01-03 6
#Freq: D, dtype: float64
print rm2
#
#2000-01-01 NaN
#2000-01-02 NaN
#2000-01-03 1
#Freq: D, dtype: float64
但似乎这在熊猫0.20.3中不再存在。我怎么能这样做?
目前,我收到了错误
ValueError:窗口必须是整数
答案 0 :(得分:0)
我不能仅仅使用新的rolling
方法来考虑任何简单的解决方案。似乎唯一的方法是创建一个数据框并使用加权值创建一个新列。
>>> df = pd.DataFrame([1,1,1], index=pd.date_range('1/1/2000', periods=3), columns=['value'])
>>> df['weight'] = [2, 2, 2]
>>> df['weighted'] = df['value'] * df['weight']
>>> df
value weight weighted
2000-01-01 1 2 2
2000-01-02 1 2 2
2000-01-03 1 2 2
计算总和非常简单。创建数据框后,使用rolling
方法和总和。使用您提供的示例,看起来窗口的大小为3。
>>> df_rolled = df.rolling(3).sum()
>>> df_rolled['weighted']
2000-01-01 NaN
2000-01-02 NaN
2000-01-03 6.0
Freq: D, Name: weighted, dtype: float64
然而,计算加权平均值需要您生成另一列,该列计算加权平均值,您在加权列中取值并将其除以权重中的值列。这可以确保您计算加权平均值,而不是加权值的平均值......这里差别很大。
>>> df_rolled['w_mean'] = df_rolled['weighted'] / df_rolled['weight']
>>> df_rolled['w_mean']
2000-01-01 NaN
2000-01-02 NaN
2000-01-03 1.0
Freq: D, Name: w_mean, dtype: float64
检查解决方案是否有效的另一个示例,它确实:
>>> df['value'] = [2, 4, 6]
>>> df['weight'] = [1, 3, 5]
>>> df['weighted'] = df['value'] * df['weight']
>>> df
value weight weighted
2000-01-01 2 1 2
2000-01-02 4 3 12
2000-01-03 6 5 30
>>> df_rolled = df.rolling(3).sum()
>>> df_rolled['weighted'] # weighted sum
2000-01-01 NaN
2000-01-02 NaN
2000-01-03 44.0
Freq: D, Name: weighted, dtype: float64
>>> df_rolled['w_mean'] = df['weighted'] / df['weight']
>>> df_rolled['w_mean'] # weighted mean
2000-01-01 NaN
2000-01-02 NaN
2000-01-03 4.888889
Freq: D, Name: w_mean, dtype: float64
>>> df_rolled = df.rolling(2).sum() # window size 2
>>> df_rolled['weighted']
2000-01-01 NaN
2000-01-02 14.0
2000-01-03 42.0
Freq: D, Name: weighted, dtype: float64
>>> df_rolled['w_mean'] = df_rolled['weighted'] / df_rolled['weight']
>>> df_rolled['w_mean']
2000-01-01 NaN
2000-01-02 3.50
2000-01-03 5.25
Freq: D, Name: w_mean, dtype: float64
答案 1 :(得分:0)
我特别感兴趣的是具有半高斯函数的老化。所以这似乎有效:
from scipy.stats import norm
import math
def half_gaussian_convolution(input):
normal_weighting = norm.pdf(np.array(range(-len(input) + 1, 1)), scale=(len(input) - 1) / 1.6448536269514722)
normal_weighting = normal_weighting / np.sum(normal_weighting)
return np.sum(normal_weighting * input)
ser.rolling(window=4, center=False).apply(func=half_gaussian_convolution)