加权数据问题,意味着很好,但Covar和std看起来不对,我该如何调整?

时间:2015-06-17 22:27:27

标签: python math numpy pandas covariance

我尝试对数据应用加权过滤器,而不是在计算stats,mu,std和covar之前使用原始数据。但结果显然需要调整。

# generate some data and a filter
f_n = 100.
np.random.seed(seed=101); 
foo = np.random.rand(f_n,3)
foo = DataFrame(foo).add(1).pct_change()
f_filter = np.arange(f_n,.0,-1)
f_filter = 1.0 / (f_filter**(f_filter/f_n))
# nominalise the filter ... This could be where I'm going wrong?
f_filter = f_filter * (f_n / f_filter.sum())

现在我们准备好了解一些结果

print foo.mul(f_filter,axis=0).mean()
print foo.mean()

0    0.039147
1    0.039013
2    0.037598
dtype: float64
0    0.035006
1    0.042244
2    0.041956
dtype: float64

意味着一切都符合要求,但是当我们看到covar和std时,它们在规模和方向上都有显着差异

print foo.mul(f_filter,axis=0).cov()
print foo.cov()

          0         1         2
0  0.124766 -0.038954  0.027256
1 -0.038954  0.204269  0.056185
2  0.027256  0.056185  0.203934

      0         1         2
0  0.070063 -0.014926  0.010434
1 -0.014926  0.099249  0.015573
2  0.010434  0.015573  0.087060

print foo.mul(f_filter,axis=0).std()
print foo.std()

0    0.353223
1    0.451961
2    0.451590
dtype: float64
0    0.264694
1    0.315037
2    0.295060
dtype: float64

任何想法为什么,我们如何调整滤波器或调整covar矩阵以使其更具可比性?

1 个答案:

答案 0 :(得分:1)

问题在于你的加权功能。 (你想要高斯随机数或均匀r.v.?)见这个图

f_n = 100.
np.random.seed(seed=101); 
# ??? you want uniform random variable? or is this just a typo and you want normal random variable?
foo = np.random.rand(f_n,3)
foo = DataFrame(foo)
f_filter = np.arange(f_n,.0,-1)

# here is the problem, uneven weight makes a artificial trend, causing non-stationary. covariance only works for stationary data.
# =============================================
f_filter = 1.0 / (f_filter**(f_filter/f_n))

fig, ax = plt.subplots()
ax.plot(f_filter)

enter image description here

不均衡的重量是一个人为的趋势(你的随机数都是正统服),导致非静止。协方差仅适用于固定数据。看一下结果加权数据。

enter image description here