在Python statsmodels中获得正确的外生最小二乘预测

时间:2016-02-16 16:23:54

标签: python pandas statsmodels

我无法在statsmodels版本0.6.1中从least squares fits获得合理的预测行为。它似乎没有提供合理的价值。

考虑以下数据

import numpy as np

xx = np.array([1.1,2.2,3.3,4.4])  # Independent variable
XX = sm.add_constant(xx)  # Include constant for matrix fitting in statsmodels
yy = np.array([2,1,5,6])  # Dependent variable
ww = np.array([0.1,1,3,0.5])  # Weights to try
wn = ww/ww.sum()  # Normalized weights
zz = 1.9  # Independent variable value to predict for

我们可以使用numpy进行加权拟合和预测

np_unw_value = np.polyval(np.polyfit(xx, yy, deg=1, w=1+0*ww), zz)
print("Unweighted fit prediction from numpy.polyval is {sp}".format(sp=np_unw_value))

我们发现预测为2.263636。

作为一项完整性检查,我们还可以看到 R 对此事的评论

import pandas as pd
import rpy2.robjects
from rpy2.robjects.packages import importr
import rpy2.robjects.pandas2ri

rpy2.robjects.pandas2ri.activate()
pdf = pd.DataFrame({'x':xx, 'y':yy, 'w':wn})
pdz = pd.DataFrame({'x':[zz], 'y':[np.Inf]})
rfit = rpy2.robjects.r.lm('y~x', data=pdf, weights=1+0*pdf['w']**2)
rpred = rpy2.robjects.r.predict(rfit, pdz)[0]
print("Unweighted fit prediction from R is {sp}".format(sp=rpred))

我们再次发现2.263636的预测值。我的问题是我们从statmodels OLS

获得结果
import statsmodels.api as sm
from statsmodels.sandbox.regression.predstd import wls_prediction_std

owls = sm.OLS(yy, XX).fit()
sm_value_u, iv_lu, iv_uu = wls_prediction_std(owls, exog=np.array([[1,zz]]))
sm_unw_v = sm_value_u[0]
print("Unweighted OLS fit prediction from statsmodels.wls_prediction_std is {sp}".format(sp=sm_unw_v))

相反,我获得了一个值1.695814(类似的事情发生在WLS())。要么存在错误,要么使用statsmodels进行预测会让我发现一些晦涩难懂的伎俩。发生了什么事?

1 个答案:

答案 0 :(得分:2)

结果类有一个predict方法,可以预测解释变量的新值:

>>> print(owls.predict(np.array([[1,zz]])))
[ 2.26363636]

wls_prediction_std的第一次返回是预测的标准误差,而不是预测本身。

>>> help(wls_prediction_std)
Help on function wls_prediction_std in module statsmodels.sandbox.regression.predstd:

wls_prediction_std(res, exog=None, weights=None, alpha=0.05)
    calculate standard deviation and confidence interval for prediction

    applies to WLS and OLS, not to general GLS,
    that is independently but not identically distributed observations

    Parameters
    ----------
    res : regression result instance
        results of WLS or OLS regression required attributes see notes
    exog : array_like (optional)
        exogenous variables for points to predict
    weights : scalar or array_like (optional)
        weights as defined for WLS (inverse of variance of observation)
    alpha : float (default: alpha = 0.05)
        confidence level for two-sided hypothesis

    Returns
    -------
    predstd : array_like, 1d
        standard error of prediction
        same length as rows of exog
    interval_l, interval_u : array_like
        lower und upper confidence bounds

沙箱功能将被结果类的新方法get_prediction取代,该方法提供预测和额外结果,如标准偏差和置信度以及预测间隔。

http://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.RegressionResults.get_prediction.html