如何从H2O回归中获取VIF

时间:2019-04-10 16:21:01

标签: r regression h2o

我正在尝试从VIF回归中获得h2o得分。是否在h2o中存储了类似VIF的函数或数据?

这是我的例子:


library(ggplot2)
library(h2o, quietly = TRUE)
library(tibble)

#build h20 sessions
h2o::h2o.init()
#>  Connection successful!

mtcars.df <- as.h2o(mtcars)
#>                                                                
  |=================================================================| 100%

#set x & y vars
y <-  "mpg"
x <-  setdiff(dput(names(mtcars)), "mpg")
#> c("mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am", 
#> "gear", "carb")

dput(names(mtcars))
#> c("mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am", 
#> "gear", "carb")
model <- h2o.glm( y = "mpg", x = setdiff(dput(names(mtcars)), "mpg"), training_frame = mtcars.df)
#> c("mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am", 
#> "gear", "carb")
#> 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%

model 
#> Model Details:
#> ==============
#> 
#> H2ORegressionModel: glm
#> Model ID:  GLM_model_R_1554907509984_6 
#> GLM Model: summary
#>     family     link                              regularization
#> 1 gaussian identity Elastic Net (alpha = 0.5, lambda = 1.0132 )
#>   number_of_predictors_total number_of_active_predictors
#> 1                         10                           9
#>   number_of_iterations    training_frame
#> 1                    1 mtcars_sid_8128_1
#> 
#> Coefficients: glm coefficients
#>        names coefficients standardized_coefficients
#> 1  Intercept    26.298144                 20.090625
#> 2        cyl    -0.447375                 -0.798977
#> 3       disp    -0.005674                 -0.703231
#> 4         hp    -0.011042                 -0.757065
#> 5       drat     0.859638                  0.459630
#> 6         wt    -1.185114                 -1.159584
#> 7       qsec     0.000000                  0.000000
#> 8         vs     0.655750                  0.330509
#> 9         am     1.116929                  0.557338
#> 10      gear     0.123540                  0.091148
#> 11      carb    -0.350465                 -0.566071
#> 
#> H2ORegressionMetrics: glm
#> ** Reported on training data. **
#> 
#> MSE:  6.511253
#> RMSE:  2.551716
#> MAE:  2.00629
#> RMSLE:  0.113459
#> Mean Residual Deviance :  6.511253
#> R^2 :  0.8149633
#> Null Deviance :1126.047
#> Null D.o.F. :31
#> Residual Deviance :208.3601
#> Residual D.o.F. :22
#> AIC :172.7651

#formula
f <- as.formula(paste(y, paste(x, collapse = " + "), sep = " ~ "))

model_lm <- lm(f, data = mtcars)

#model output
model_lm
#> 
#> Call:
#> lm(formula = f, data = mtcars)
#> 
#> Coefficients:
#> (Intercept)          cyl         disp           hp         drat  
#>    12.30337     -0.11144      0.01334     -0.02148      0.78711  
#>          wt         qsec           vs           am         gear  
#>    -3.71530      0.82104      0.31776      2.52023      0.65541  
#>        carb  
#>    -0.19942

# package for vif variables
library(car)
#> Warning: package 'car' was built under R version 3.5.3
#> Loading required package: carData
#> 
#> Attaching package: 'car'
#> The following object is masked from 'package:dplyr':
#> 
#>     recode

# list of VIF values
car::vif(model_lm) %>% as_tibble(rownames = "x_vars") %>%  arrange(desc(value))
#> Warning: Calling `as_tibble()` on a vector is discouraged, because the behavior is likely to change in the future. Use `enframe(name = NULL)` instead.
#> This warning is displayed once per session.
#> # A tibble: 10 x 2
#>    x_vars value
#>    <chr>  <dbl>
#>  1 disp   21.6 
#>  2 cyl    15.4 
#>  3 wt     15.2 
#>  4 hp      9.83
#>  5 carb    7.91
#>  6 qsec    7.53
#>  7 gear    5.36
#>  8 vs      4.97
#>  9 am      4.65
#> 10 drat    3.37

reprex package(v0.2.1)于2019-04-10创建

1 个答案:

答案 0 :(得分:0)

H2O-3目前尚不提供VIF函数,但是您始终可以创建一个JIRA ticket并为其提出功能要求,或者尝试手动进行计算。

或者,根据您的最终目标,您可以使用remove_collinear_columns,如文档中所述,它用于:”指定是否在模型构建期间自动删除共线列。启用后,共线列将从模型中删除,并且在返回的模型中系数为0。只有在没有正则化(lambda = 0)的情况下才能设置此列。“