我“使用”Statsmodel
的时间不到2天,并且完全不熟悉导入命令等。我想从here运行一个简单的variance_inflation_factor
但我有一些问题。我的代码如下:
from numpy import *
import numpy as np
import pandas as pd
from pandas import DataFrame, Series
import statsmodels.formula.api as sm
from sklearn.linear_model import LinearRegression
import scipy, scipy.stats
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')
from statsmodels.api import add_constant
from numpy import linalg as LA
import statsmodels as sm
## I have been adding libraries and modules/packages with the intention of erring on the side of caution
a = df1.years_exp
b = df1.leg_totalbills
c = df1.log_diff_rgdp
d = df1.unemployment
e = df1.expendituresfor
f = df1.direct_expenditures
g = df1.indirect_expenditures
sm.variance_inflation_factor((['a', 'b', 'c', 'd', 'e', 'f']), g)
then I get the following error:
AttributeError Traceback (most recent call last)
<ipython-input-61-bb126535eadd> in <module>()
----> 1 sm.variance_inflation_factor((['a', 'b', 'c', 'd', 'e', 'f']), g)
AttributeError: module 'statsmodels' has no attribute 'variance_inflation_factor'
有人可以指导我加载和执行此模块的正确语法吗?如果我发布链接到某些源代码更方便,请询问。但是,我觉得这只是一个简单的语法问题。
答案 0 :(得分:1)
variance_inflation_factor
中的statsmodels.stats.outlier_influence
功能可见in the docs,因此要使用它,您必须正确导入,选项可以是
from statsmodels.stats import outliers_influence
# code here
outliers_influence.variance_inflation_factor((['a', 'b', 'c', 'd', 'e', 'f']), g)
答案 1 :(得分:1)
感谢您提出这个问题!我今天也有同样的问题,除了我想计算每个特征的方差膨胀系数。这是一种编程方式:
from patsy import dmatrices
from statsmodels.stats.outliers_influence import variance_inflation_factor
# 'feature_1 + feature_2 ... feature_p'
features_formula = "+".join(df1.columns - ["indirect_expenditures"])
# get y and X dataframes based on this formula:
# indirect_expenditures ~ feature_1 + feature_2 ... feature_p
y, X = dmatrices('indirect_expenditures ~' + features_formula, df1, return_type='dataframe')
# For each Xi, calculate VIF and save in dataframe
vif = pd.DataFrame()
vif["vif"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif["features"] = X.columns
vif
请注意,只有在您导入pandas
且df1为pandas DataFrame
答案 2 :(得分:0)
a = df1.years_exp
b = df1.leg_totalbills
c = df1.log_diff_rgdp
d = df1.unemployment
e = df1.expendituresfor
f = df1.direct_expenditures
g = df1.indirect_expenditures
ck=np.array([a,b,c,d,e,f,g])
outliers_influence.variance_inflation_factor(ck, 6)