H2O如何为GLM选择最佳变量

时间:2018-07-16 16:14:32

标签: r h2o glm

我将预测变量放在下面的gridsearch中。据我了解,此gridsearch选择了应该在我们的模型中使用的最佳变量,并丢弃了其他变量。但是,我不知道它根据哪种算法/选择指标来选择最佳变量。有人可以告诉我它如何选择要保留的变量和要丢弃的变量吗?

功能:

  grid.f <-               h2o.grid(algorithm = "glm",                                     # Setting algorithm type
                                   grid_id = "grid.f",                                    # Id so retrieving information on iterations will be easier later
                                   x = predictors,                                        # Setting predictive features
                                   y = response,                                          # Setting target variable
                                   training_frame = data,                                 # Setting training set
                                   hyper_params = hyper_parameters,                       # Setting apha values for iterations
                                   remove_collinear_columns = T,                          # Parameter to remove collinear columns
                                   lambda_search = T,                                     # Setting parameter to find optimal lambda value
                                   seed = p.seed,                                         # Setting to ensure replicateable results
                                   keep_cross_validation_predictions = F,                 # Setting to save cross validation predictions
                                   compute_p_values = F,                                  # Calculating p-values of the coefficients
                                   family = family,                                       # Distribution type used
                                   standardize = T,                                       # Standardizing continuous variables
                                   nfolds = p.folds,                                      # Number of cross-validations
                                   #max_active_predictors = p.max,                         # Setting for number of features
                                   fold_assignment = "Modulo",                            # Specifying fold assignment type to use for cross validations
                                   link = p.link)                                         # Link function for distribution

1 个答案:

答案 0 :(得分:1)

即使没有网格搜索,H2O-3的GLM也会使用L1正则化(也称为“套索”)来找出哪些变量可以从模型中扣除。

弹性网是L1(套索)和L2(岭回归)的混合,并由alpha和lambda参数控制。

GLM手册是详细信息的很好参考: