使用Pandas OLS进行预测

时间:2012-03-30 13:22:40

标签: python pandas scikits

我一直在使用scikits.statsmodels OLS predict函数来预测拟合数据,但现在想转向使用Pandas。

文档refers to OLS以及名为y_predict的函数,但我找不到有关如何正确使用它的任何文档。

举例来说:

exogenous = {
    "1998": "4760","1999": "5904","2000": "4504","2001": "9808","2002": "4241","2003": "4086","2004": "4687","2005": "7686","2006": "3740","2007": "3075","2008": "3753","2009": "4679","2010": "5468","2011": "7154","2012": "4292","2013": "4283","2014": "4595","2015": "9194","2016": "4221","2017": "4520"}
endogenous = {
    "1998": "691", "1999": "1580", "2000": "80", "2001": "1450", "2002": "555", "2003": "956", "2004": "877", "2005": "614", "2006": "468", "2007": "191"}

import numpy as np
from pandas import *

ols_test = ols(y=Series(endogenous), x=Series(exogenous))

然而,虽然我可以产生合身:

>>> ols_test.y_fitted
1998     675.268299
1999     841.176837
2000     638.141913
2001    1407.354228
2002     600.000352
2003     577.521485
2004     664.681478
2005    1099.611292
2006     527.342854
2007     430.901264

预测不会产生任何不同:

>>> ols_test.y_predict
1998     675.268299
1999     841.176837
2000     638.141913
2001    1407.354228
2002     600.000352
2003     577.521485
2004     664.681478
2005    1099.611292
2006     527.342854
2007     430.901264

在scikits.statsmodels中,可以执行以下操作:

import scikits.statsmodels.api as sm
...
ols_model = sm.OLS(endogenous, np.column_stack(exogenous))
ols_results = ols_mod.fit()
ols_pred = ols_mod.predict(np.column_stack(exog_prediction_values))

我如何在熊猫中做到这一点,将内生数据预测到外生的极限?

更新:感谢Chang,新版本的Pandas(0.7.3)现在已将此功能作为标准配置。

1 个答案:

答案 0 :(得分:2)

是您如何获得回归的预测y值的问题?或者是如何使用回归系数来获得外生变量的不同样本集的预测y值? pandas y_predict和y_fitted应该给你相同的值,两者都应该给你与scikits.statsmodels中的predict方法相同的值。

如果您正在寻找回归系数,请执行ols_test.beta