我正在尝试以下代码来估计样本中的方差,并将其与numpy.var实现进行比较。
import numpy as np
def rcov(xj, (i, Mi, Si)):
j = i + 1
Mj = Mi + (xj - Mi) / j
Sj = Si + (i/j) * (xj - Mi) ** 2
return (j, Mj, Sj)
def mycov(X):
s = (0., 0, 0)
for i in xrange(len(X)):
s = rcov(X[i], s)
return s[-1] / (len(X) - 1) # sample covariance
X = np.random.rand(1000000)
N = 1e+15
print
print 'Sample (co)variance with 1 dof.'
print '-------------------------------'
print 'np.var(X) ', np.var(X, ddof=1)
print 'mycov(X) ', mycov(X)
print 'np.var(X+N) ', np.var(X+N, ddof=1)
print 'mycov(X+N) ', mycov(X+N)
输出
Sample (co)variance with 1 dof.
-------------------------------
np.var(X) 0.0833039106062
mycov(X) 0.0833039106062
np.var(X+N) 19208514.2744
mycov(X+N) 0.0859324294763
提出两个问题:
答案 0 :(得分:3)
为了速度,NumPy不检查算术溢出或下溢。用户有责任选择大到足以在所有计算中保持所需精度水平的dtypes。
使用NumPy 1.8,并选择X
为dtype longdouble
,而非默认float64
,
X = np.random.rand(1000000).astype('longdouble')
产量
Sample (co)variance with 1 dof.
-------------------------------
np.var(X) 0.0832200737104
mycov(X) 0.0832200737104
np.var(X+N) 0.0832200805148
mycov(X+N) 0.0832199500372