我试图使用statsmodels.stats.outliers_influence_variance_inflation_factor
找到数据集的方差膨胀因子(VIF)。我运行了完全分类变量数据集的代码,并返回了VIF矩阵。但是,当我为不同的数据集运行代码时,我不断收到Type错误。该数据集具有数值变量。请帮助。请找到以下代码:
from statsmodels.stats.outliers_influence import variance_inflation_factor as vif
full_data = full_data.drop(response, axis=1)
data_columns = full_data.columns
out = pd.DataFrame()
Variable_name = []
VIF = []
for col_name in data_columns:
index = full_data.columns.get_loc(col_name)
print index
data = full_data.as_matrix()
Variable_name.append(col_name)
VIF.append(np.array(vif(data, index), dtype=float))
out['Variable_name'] = Variable_name
out['Variance Inflation Factor'] = VIF
它适用于像这样的数据集
a b c
1 1 1
1 2 2
1 1 3
1 1 4
2 1 5
但不是这样的东西-42.1585 14.7353 3.45338 4.67938 -19.0982 -17.0384 -60.3472 3.45338 4.67938 -19.0982 -42.1585 -5.62736 3.45338 4.67938 -19.0982 -17.0384 -98.0905 3.45338 4.67938 -19.0982
-17.0384 -5.62736 3.45338 4.67938 -19.0982