我从Pandas汇集的OLS回归中得到以下输出。唯一的问题是我不确定拦截在哪里。在回归中,总是存在一个通常在外生变量之前列出的截距,即Y = a + ßx1 + ßx2 + error_term
我在回归中没有看到它。我使用了ayhan X = add_constant(X)
的建议,但不知怎的,我觉得我用语法弄乱了一些东西(用一种明显的方式)。我知道这不是火箭科学。有人能告诉我我错过了什么吗?
import numpy as np
import pandas as pd
from pandas import DataFrame, Series
import statsmodels.formula.api as sm
from sklearn.linear_model import LinearRegression
import scipy, scipy.stats
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')
from statsmodels.api import add_constant
X = add_constant(X)
Y = df['billsum_support']
X = df[['direct_expenditures','indirect_expenditures', 'years_exp', 'leg_totalbills',\
'log_diff_rgdp', 'unemployment', 'expendituresfor']]
result = sm.OLS( Y, X ).fit()
result.summary()
OLS Regression Results Dep. Variable: billsum_support R-squared: 0.663
Model: OLS Adj. R-squared: 0.663
Method: Least Squares F-statistic: 3932.
Date: Sun, 08 May 2016 Prob (F-statistic): 0.00
Time: 22:38:33 Log-Likelihood: -12561.
No. Observations: 12008 AIC: 2.513e+04
Df Residuals: 12002 BIC: 2.518e+04
Df Model: 6
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
direct_expenditures 4.575e-05 4.02e-06 11.377 0.000 3.79e-05 5.36e-05
indirect_expenditures -2.147e-05 6.93e-06 -3.099 0.002 -3.5e-05 -7.89e-06
years_exp 0.0030 0.001 5.595 0.000 0.002 0.004
leg_totalbills 0.0052 0.000 11.160 0.000 0.004 0.006
log_diff_rgdp 1.0325 0.178 5.805 0.000 0.684 1.381
unemployment 0.1052 0.001 70.744 0.000 0.102 0.108
expendituresfor 2.428e-05 3.57e-06 6.797 0.000 1.73e-05 3.13e-05
Omnibus: 2994.033 Durbin-Watson: 0.837
Prob(Omnibus): 0.000 Jarque-Bera (JB): 19159.354
Skew: 1.042 Prob(JB): 0.00
Kurtosis: 8.827 Cond. No. 1.54e+16
答案 0 :(得分:5)
您需要明确告诉statsmodels符合拦截。使用statsmodels.api.add_constant
更新您的自变量:
from statsmodels.api import add_constant
Y = df['billsum_support']
X = df[['direct_expenditures','indirect_expenditures', 'years_exp', 'leg_totalbills',\
'log_diff_rgdp', 'unemployment', 'expendituresfor']]
X = add_constant(X)
result = sm.OLS( Y, X ).fit()