使用h2o.grid微调gbm模型权重列问题

时间:2017-04-12 18:27:04

标签: r grid h2o

我正在使用h2o.grid超参数搜索功能来微调gbm模型。 h2o gbm允许添加一个重量列来指定每个观察的重量。但是,当我尝试在h2o.grid中添加它时,即使填充了权重量,也总是错误地说非法参数/缺失值。 有谁有类似的经历?感谢

超参数:max_depth,20 [2017-04-12 13:10:05] failure_details:GBM模型的非法参数:depth_grid_model_11。详细信息:字段上的ERRR:_ weights_columns:权重不能包含缺失值。 字段上的ERRR:_ weights_columns:权重不能有缺失值。

============================

hyper_params = list( max_depth = c(4,6,8,12,16,20) ) ##faster for larger datasets

grid <- h2o.grid(
  ## hyper parameters
  hyper_params = hyper_params,

  ## full Cartesian hyper-parameter search
  search_criteria = list(strategy = "Cartesian"),  ## default is Cartesian

  ## which algorithm to run
  algorithm="gbm",

  ## identifier for the grid, to later retrieve it
  grid_id="depth_grid",

  ## standard model parameters
  x = X,  #predictors, 
  y = Y,  #response, 
  training_frame = datadev, #train, 
  validation_frame = dataval, #valid,
    **weights_column = "Adj_Bias_correction",**

  ## more trees is better if the learning rate is small enough 
  ## here, use "more than enough" trees - we have early stopping
  ntrees = 10000,                                                            

  ## smaller learning rate is better
  ## since we have learning_rate_annealing, we can afford to start with a bigger learning rate
  learn_rate = 0.05,                                                         

  ## learning rate annealing: learning_rate shrinks by 1% after every tree 
  ## (use 1.00 to disable, but then lower the learning_rate)
  learn_rate_annealing = 0.99,                                               

  ## sample 80% of rows per tree
  sample_rate = 0.8,                                                       

  ## sample 80% of columns per split
  col_sample_rate = 0.8, 

  ## fix a random number generator seed for reproducibility
  seed = 1234,                                                             

  ## early stopping once the validation AUC doesn't improve by at least 0.01% for 5 consecutive scoring events
  stopping_rounds = 5,   stopping_tolerance = 1e-4,   stopping_metric = "AUC", 

  ## score every 10 trees to make early stopping reproducible (it depends on the scoring interval)
  score_tree_interval = 10                                                
)

## by default, display the grid search results sorted by increasing logloss (since this is a classification task)
grid                              

0 个答案:

没有答案