R Xgboost验证错误作为停止指标

时间:2018-05-07 14:13:23

标签: r xgboost

我在xgboost二进制分类模型上使用火车和验证数据集。

params5 <- list(booster = "gbtree", objective = "binary:logistic", 
            eta=0.0001, gamma=0.5, max_depth=15, min_child_weight=1, subsample=0.6,
            colsample_bytree=0.4,seed =2222)


xgb_MOD5 <- xgb.train (params = params5, data = dtrain, nrounds = 4000,
                   watchlist = list(validation = dvalid,train = dtrain), 
                   print_every_n =30,early_stopping_rounds = 100
                  maximize = F ,serialize = TRUE)

它会自动选择列车错误作为停止指标。这导致模型在过度拟合时继续训练。

Multiple eval metrics are present. Will use train_error for early stopping.
Will train until train_error hasn't improved in 100 rounds.

如何将验证错误指定为停止指标?

2 个答案:

答案 0 :(得分:1)

感谢@abhiieor提供解决方案。当我们仅使用监视列表中的验证时,从我观察到的内容中添加:

xgb_MOD5 <- xgb.train (params = params5, data = dtrain, nrounds = 400,watchlist = list(validation = dvalid),
                   print_every_n =30,early_stopping_rounds = 100, maximize = F ,serialize = TRUE)

运行时记录结果:

     [1]    validation-error:0.222037 
     Will train until validation_error hasn't improved in 100 rounds.

       [31] validation-error:0.201712 
       [61] validation-error:0.201635 

如果我们想在运行时看到列车错误和验证错误,

在监视列表中添加验证作为第二个参数,并使用验证错误作为停止指标

xgb_MOD5 <- xgb.train (params = params5, data = dtrain, nrounds = 400,watchlist = list(train =dtrain,validation = dvalid),
                  print_every_n =30,early_stopping_rounds = 100, maximize = F ,serialize = TRUE)

       [1]  train-error:0.202131    validation-error:0.232341 
       Multiple eval metrics are present. Will use validation_error for early stopping.
      Will train until validation_error hasn't improved in 100 rounds.
       [31] train-error:0.174278    validation-error:0.202871 
       [61] train-error:0.173909    validation-error:0.202288 

答案 1 :(得分:0)

我不使用xgboost的R绑定,而R-package文档并不具体。但是,python-API documentation(请参阅early_stopping_rounds参数文档)对此问题进行了相关说明:

  

evals中至少需要一个项目。如果有多个,将使用最后一个。

此处,evals是将评估指标的样本列表,即它类似于您的watchlist参数。所以我猜,可能只需要交换作为参数提供的列表中的项目顺序