我将预测变量放在下面的gridsearch中。据我了解,此gridsearch选择了应该在我们的模型中使用的最佳变量,并丢弃了其他变量。但是,我不知道它根据哪种算法/选择指标来选择最佳变量。有人可以告诉我它如何选择要保留的变量和要丢弃的变量吗?
功能:
grid.f <- h2o.grid(algorithm = "glm", # Setting algorithm type
grid_id = "grid.f", # Id so retrieving information on iterations will be easier later
x = predictors, # Setting predictive features
y = response, # Setting target variable
training_frame = data, # Setting training set
hyper_params = hyper_parameters, # Setting apha values for iterations
remove_collinear_columns = T, # Parameter to remove collinear columns
lambda_search = T, # Setting parameter to find optimal lambda value
seed = p.seed, # Setting to ensure replicateable results
keep_cross_validation_predictions = F, # Setting to save cross validation predictions
compute_p_values = F, # Calculating p-values of the coefficients
family = family, # Distribution type used
standardize = T, # Standardizing continuous variables
nfolds = p.folds, # Number of cross-validations
#max_active_predictors = p.max, # Setting for number of features
fold_assignment = "Modulo", # Specifying fold assignment type to use for cross validations
link = p.link) # Link function for distribution
答案 0 :(得分:1)
即使没有网格搜索,H2O-3的GLM也会使用L1正则化(也称为“套索”)来找出哪些变量可以从模型中扣除。
弹性网是L1(套索)和L2(岭回归)的混合,并由alpha和lambda参数控制。
GLM手册是详细信息的很好参考: