Question

我将预测变量放在下面的gridsearch中。据我了解，此gridsearch选择了应该在我们的模型中使用的最佳变量，并丢弃了其他变量。但是，我不知道它根据哪种算法/选择指标来选择最佳变量。有人可以告诉我它如何选择要保留的变量和要丢弃的变量吗？

功能：

  grid.f <-               h2o.grid(algorithm = "glm",                                     # Setting algorithm type
                                   grid_id = "grid.f",                                    # Id so retrieving information on iterations will be easier later
                                   x = predictors,                                        # Setting predictive features
                                   y = response,                                          # Setting target variable
                                   training_frame = data,                                 # Setting training set
                                   hyper_params = hyper_parameters,                       # Setting apha values for iterations
                                   remove_collinear_columns = T,                          # Parameter to remove collinear columns
                                   lambda_search = T,                                     # Setting parameter to find optimal lambda value
                                   seed = p.seed,                                         # Setting to ensure replicateable results
                                   keep_cross_validation_predictions = F,                 # Setting to save cross validation predictions
                                   compute_p_values = F,                                  # Calculating p-values of the coefficients
                                   family = family,                                       # Distribution type used
                                   standardize = T,                                       # Standardizing continuous variables
                                   nfolds = p.folds,                                      # Number of cross-validations
                                   #max_active_predictors = p.max,                         # Setting for number of features
                                   fold_assignment = "Modulo",                            # Specifying fold assignment type to use for cross validations
                                   link = p.link)                                         # Link function for distribution

Answer 1

即使没有网格搜索，H2O-3的GLM也会使用L1正则化（也称为“套索”）来找出哪些变量可以从模型中扣除。

弹性网是L1（套索）和L2（岭回归）的混合，并由alpha和lambda参数控制。

GLM手册是详细信息的很好参考：

http://docs.h2o.ai/h2o/latest-stable/h2o-docs/booklets/GLMBooklet.pdf

H2O如何为GLM选择最佳变量

1 个答案: