在Python中轻松访问标准化残差,厨师价值,帽子值(杠杆)等?

时间:2017-09-19 15:37:02

标签: python scikit-learn linear-regression statsmodels

我在拟合线性回归后寻找影响统计数据。在R中我可以得到它们(例如):

hatvalues(fitted_model) #hatvalues (leverage)
cooks.distance(fitted_model) #Cook's D values
rstandard(fitted_model) #standardized residuals
rstudent(fitted_model) #studentized residuals

在拟合这样的模型后,如何在Python中使用statsmodel时获得相同的统计信息:

#import statsmodels
import statsmodels.api as sm

#Fit linear model to any dataset
model = sm.OLS(Y,X)
results = model.fit()

#Creating a dataframe that includes the studentized residuals
sm.regression.linear_model.OLSResults.outlier_test(results)

编辑:见下面的答案......

2 个答案:

答案 0 :(得分:4)

答案 1 :(得分:1)

尽管接受的答案是正确的,但我发现在拟合模型后,将统计信息作为影响实例(statsmodels.regression.linear_model.OLSResults.get_influence的实例属性)进行访问是有帮助的。这使我不必为summary_frame编制索引,因为我只对其中一个统计信息感兴趣,而不对所有统计信息都感兴趣。所以也许这对其他人有帮助:

import statsmodels.api as sm

#Fit linear model to any dataset
model = sm.OLS(Y,X)
results = model.fit()

#create instance of influence
influence = results.get_influence()

#leverage (hat values)
leverage = influence.hat_matrix_diag

#Cook's D values (and p-values) as tuple of arrays
cooks_d = influence.cooks_distance

#standardized residuals
standardized_residuals = influence.resid_studentized_internal

#studentized residuals
studentized_residuals = influence.resid_studentized_external