VIF by coef in OLS Regression结果Python

时间:2017-02-15 19:18:41

标签: python linear-regression data-science

我试图用coef打印VIF(方差膨胀因子)。但是,我似乎无法从statsmodels中找到任何文档显示如何?我有一个我需要处理的n个变量的模型,所有变量的多重共线性值都不能帮助去除具有最高共线性的值。

这看起来像是一个答案

https://stats.stackexchange.com/questions/155028/how-to-systematically-remove-collinear-variables-in-python

但是如何针对此工作簿运行它。

http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv

下面是代码和摘要输出,也是我现在的位置。

import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf

# read data into a DataFrame
data = pd.read_csv('somepath', index_col=0)
print(data.head())

#multiregression
lm = smf.ols(formula='Sales ~ TV + Radio + Newspaper', data=data).fit()
print(lm.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  Sales   R-squared:                       0.897
Model:                            OLS   Adj. R-squared:                  0.896
Method:                 Least Squares   F-statistic:                     570.3
Date:                Wed, 15 Feb 2017   Prob (F-statistic):           1.58e-96
Time:                        13:28:29   Log-Likelihood:                -386.18
No. Observations:                 200   AIC:                             780.4
Df Residuals:                     196   BIC:                             793.6
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept      2.9389      0.312      9.422      0.000         2.324     3.554
TV             0.0458      0.001     32.809      0.000         0.043     0.049
Radio          0.1885      0.009     21.893      0.000         0.172     0.206
Newspaper     -0.0010      0.006     -0.177      0.860        -0.013     0.011
==============================================================================
Omnibus:                       60.414   Durbin-Watson:                   2.084
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              151.241
Skew:                          -1.327   Prob(JB):                     1.44e-33
Kurtosis:                       6.332   Cond. No.                         454.
==============================================================================

1 个答案:

答案 0 :(得分:1)

获取VIF列表:

from statsmodels.stats.outliers_influence import variance_inflation_factor

variables = lm.model.exog
vif = [variance_inflation_factor(variables, i) for i in range(variables.shape[1])]
vif 

得到他们的意思:

np.array(vif).mean()