我正在使用h2o.grid超参数搜索功能来微调gbm模型。 h2o gbm允许添加一个重量列来指定每个观察的重量。但是,当我尝试在h2o.grid中添加它时,即使填充了权重量,也总是错误地说非法参数/缺失值。 有谁有类似的经历?感谢
超参数:max_depth,20 [2017-04-12 13:10:05] failure_details:GBM模型的非法参数:depth_grid_model_11。详细信息:字段上的ERRR:_ weights_columns:权重不能包含缺失值。 字段上的ERRR:_ weights_columns:权重不能有缺失值。
============================
hyper_params = list( max_depth = c(4,6,8,12,16,20) ) ##faster for larger datasets
grid <- h2o.grid(
## hyper parameters
hyper_params = hyper_params,
## full Cartesian hyper-parameter search
search_criteria = list(strategy = "Cartesian"), ## default is Cartesian
## which algorithm to run
algorithm="gbm",
## identifier for the grid, to later retrieve it
grid_id="depth_grid",
## standard model parameters
x = X, #predictors,
y = Y, #response,
training_frame = datadev, #train,
validation_frame = dataval, #valid,
**weights_column = "Adj_Bias_correction",**
## more trees is better if the learning rate is small enough
## here, use "more than enough" trees - we have early stopping
ntrees = 10000,
## smaller learning rate is better
## since we have learning_rate_annealing, we can afford to start with a bigger learning rate
learn_rate = 0.05,
## learning rate annealing: learning_rate shrinks by 1% after every tree
## (use 1.00 to disable, but then lower the learning_rate)
learn_rate_annealing = 0.99,
## sample 80% of rows per tree
sample_rate = 0.8,
## sample 80% of columns per split
col_sample_rate = 0.8,
## fix a random number generator seed for reproducibility
seed = 1234,
## early stopping once the validation AUC doesn't improve by at least 0.01% for 5 consecutive scoring events
stopping_rounds = 5, stopping_tolerance = 1e-4, stopping_metric = "AUC",
## score every 10 trees to make early stopping reproducible (it depends on the scoring interval)
score_tree_interval = 10
)
## by default, display the grid search results sorted by increasing logloss (since this is a classification task)
grid