我无法在statsmodels
版本0.6.1中从least squares fits获得合理的预测行为。它似乎没有提供合理的价值。
考虑以下数据
import numpy as np
xx = np.array([1.1,2.2,3.3,4.4]) # Independent variable
XX = sm.add_constant(xx) # Include constant for matrix fitting in statsmodels
yy = np.array([2,1,5,6]) # Dependent variable
ww = np.array([0.1,1,3,0.5]) # Weights to try
wn = ww/ww.sum() # Normalized weights
zz = 1.9 # Independent variable value to predict for
我们可以使用numpy
进行加权拟合和预测
np_unw_value = np.polyval(np.polyfit(xx, yy, deg=1, w=1+0*ww), zz)
print("Unweighted fit prediction from numpy.polyval is {sp}".format(sp=np_unw_value))
我们发现预测为2.263636。
作为一项完整性检查,我们还可以看到 R 对此事的评论
import pandas as pd
import rpy2.robjects
from rpy2.robjects.packages import importr
import rpy2.robjects.pandas2ri
rpy2.robjects.pandas2ri.activate()
pdf = pd.DataFrame({'x':xx, 'y':yy, 'w':wn})
pdz = pd.DataFrame({'x':[zz], 'y':[np.Inf]})
rfit = rpy2.robjects.r.lm('y~x', data=pdf, weights=1+0*pdf['w']**2)
rpred = rpy2.robjects.r.predict(rfit, pdz)[0]
print("Unweighted fit prediction from R is {sp}".format(sp=rpred))
我们再次发现2.263636的预测值。我的问题是我们不从statmodels OLS
获得结果import statsmodels.api as sm
from statsmodels.sandbox.regression.predstd import wls_prediction_std
owls = sm.OLS(yy, XX).fit()
sm_value_u, iv_lu, iv_uu = wls_prediction_std(owls, exog=np.array([[1,zz]]))
sm_unw_v = sm_value_u[0]
print("Unweighted OLS fit prediction from statsmodels.wls_prediction_std is {sp}".format(sp=sm_unw_v))
相反,我获得了一个值1.695814(类似的事情发生在WLS()
)。要么存在错误,要么使用statsmodels
进行预测会让我发现一些晦涩难懂的伎俩。发生了什么事?
答案 0 :(得分:2)
结果类有一个predict
方法,可以预测解释变量的新值:
>>> print(owls.predict(np.array([[1,zz]])))
[ 2.26363636]
wls_prediction_std
的第一次返回是预测的标准误差,而不是预测本身。
>>> help(wls_prediction_std)
Help on function wls_prediction_std in module statsmodels.sandbox.regression.predstd:
wls_prediction_std(res, exog=None, weights=None, alpha=0.05)
calculate standard deviation and confidence interval for prediction
applies to WLS and OLS, not to general GLS,
that is independently but not identically distributed observations
Parameters
----------
res : regression result instance
results of WLS or OLS regression required attributes see notes
exog : array_like (optional)
exogenous variables for points to predict
weights : scalar or array_like (optional)
weights as defined for WLS (inverse of variance of observation)
alpha : float (default: alpha = 0.05)
confidence level for two-sided hypothesis
Returns
-------
predstd : array_like, 1d
standard error of prediction
same length as rows of exog
interval_l, interval_u : array_like
lower und upper confidence bounds
沙箱功能将被结果类的新方法get_prediction
取代,该方法提供预测和额外结果,如标准偏差和置信度以及预测间隔。