我正在尝试从VIF
回归中获得h2o
得分。是否在h2o
中存储了类似VIF的函数或数据?
这是我的例子:
library(ggplot2)
library(h2o, quietly = TRUE)
library(tibble)
#build h20 sessions
h2o::h2o.init()
#> Connection successful!
mtcars.df <- as.h2o(mtcars)
#>
|=================================================================| 100%
#set x & y vars
y <- "mpg"
x <- setdiff(dput(names(mtcars)), "mpg")
#> c("mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am",
#> "gear", "carb")
dput(names(mtcars))
#> c("mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am",
#> "gear", "carb")
model <- h2o.glm( y = "mpg", x = setdiff(dput(names(mtcars)), "mpg"), training_frame = mtcars.df)
#> c("mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am",
#> "gear", "carb")
#>
|
| | 0%
|
|=================================================================| 100%
model
#> Model Details:
#> ==============
#>
#> H2ORegressionModel: glm
#> Model ID: GLM_model_R_1554907509984_6
#> GLM Model: summary
#> family link regularization
#> 1 gaussian identity Elastic Net (alpha = 0.5, lambda = 1.0132 )
#> number_of_predictors_total number_of_active_predictors
#> 1 10 9
#> number_of_iterations training_frame
#> 1 1 mtcars_sid_8128_1
#>
#> Coefficients: glm coefficients
#> names coefficients standardized_coefficients
#> 1 Intercept 26.298144 20.090625
#> 2 cyl -0.447375 -0.798977
#> 3 disp -0.005674 -0.703231
#> 4 hp -0.011042 -0.757065
#> 5 drat 0.859638 0.459630
#> 6 wt -1.185114 -1.159584
#> 7 qsec 0.000000 0.000000
#> 8 vs 0.655750 0.330509
#> 9 am 1.116929 0.557338
#> 10 gear 0.123540 0.091148
#> 11 carb -0.350465 -0.566071
#>
#> H2ORegressionMetrics: glm
#> ** Reported on training data. **
#>
#> MSE: 6.511253
#> RMSE: 2.551716
#> MAE: 2.00629
#> RMSLE: 0.113459
#> Mean Residual Deviance : 6.511253
#> R^2 : 0.8149633
#> Null Deviance :1126.047
#> Null D.o.F. :31
#> Residual Deviance :208.3601
#> Residual D.o.F. :22
#> AIC :172.7651
#formula
f <- as.formula(paste(y, paste(x, collapse = " + "), sep = " ~ "))
model_lm <- lm(f, data = mtcars)
#model output
model_lm
#>
#> Call:
#> lm(formula = f, data = mtcars)
#>
#> Coefficients:
#> (Intercept) cyl disp hp drat
#> 12.30337 -0.11144 0.01334 -0.02148 0.78711
#> wt qsec vs am gear
#> -3.71530 0.82104 0.31776 2.52023 0.65541
#> carb
#> -0.19942
# package for vif variables
library(car)
#> Warning: package 'car' was built under R version 3.5.3
#> Loading required package: carData
#>
#> Attaching package: 'car'
#> The following object is masked from 'package:dplyr':
#>
#> recode
# list of VIF values
car::vif(model_lm) %>% as_tibble(rownames = "x_vars") %>% arrange(desc(value))
#> Warning: Calling `as_tibble()` on a vector is discouraged, because the behavior is likely to change in the future. Use `enframe(name = NULL)` instead.
#> This warning is displayed once per session.
#> # A tibble: 10 x 2
#> x_vars value
#> <chr> <dbl>
#> 1 disp 21.6
#> 2 cyl 15.4
#> 3 wt 15.2
#> 4 hp 9.83
#> 5 carb 7.91
#> 6 qsec 7.53
#> 7 gear 5.36
#> 8 vs 4.97
#> 9 am 4.65
#> 10 drat 3.37
由reprex package(v0.2.1)于2019-04-10创建
答案 0 :(得分:0)
H2O-3目前尚不提供VIF函数,但是您始终可以创建一个JIRA ticket并为其提出功能要求,或者尝试手动进行计算。
或者,根据您的最终目标,您可以使用remove_collinear_columns,如文档中所述,它用于:”指定是否在模型构建期间自动删除共线列。启用后,共线列将从模型中删除,并且在返回的模型中系数为0。只有在没有正则化(lambda = 0)的情况下才能设置此列。“