Question

我在xgboost R包中使用xgb.train（）来拟合分类模型。我试图弄清楚什么是停止树的最佳迭代。我设置了early_stop_rounds = 6，通过观察每个迭代的指标，我可以清楚地看到验证数据上的auc性能达到最大值然后降低。但是，模型不会停止并继续运行，直到达到指定的nround。

问题1：当验证性能开始下降时，它是迭代定义的最佳模型（对于给定参数）吗？

问题2：为什么当验证开始减少时，模型不会停止？

问题3：Maximize参数= FALSE是什么意思？如果将其设置为FALSE，它会停止什么？设置early_stop_round时它必须为FALSE吗？

问题4：模型如何知道哪一个是监视列表中的验证数据？我见过人们使用test =，eval =，validation1 = etc？

谢谢！

param<-list(
  objective="binary:logistic",
  booster="gbtree",
  eta=0.02, #Control the learning rate
  max.depth=3, #Maximum depth of the tree
  subsample=0.8, #subsample ratio of the training instance
  colsample_bytree=0.5 # subsample ratio of columns when constructing each     tree
)

watchlist<-list(train=mtrain,validation=mtest)

sgb_model<-xgb.train(params=param, # this is the modeling parameter set     above
                 data = mtrain,
                 scale_pos_weight=1,
                 max_delta_step=1,
                 missing=NA,
                 nthread=2,
                 nrounds = 500, #total run 1500 rounds
                 verbose=2,
                 early_stop_rounds=6, #if performance not improving for 6 rounds, model iteration stops
                 watchlist=watchlist,
                 maximize=FALSE,
                 eval.metric="auc" #Maximize AUC to evaluate model
                 #metric_name = 'validation-auc'
                 )

Answer 1

答案1：不，不是最好的，但偏离差异也足够好权衡观点。
答案2：它有效，可能是您的代码存在问题。请您在每个提升步骤中分享列车和测试装置AUC的进度输出来证明这一点吗？如果您100％确定它不起作用，那么您可以在XGBoost git项目中提交错误票据。
答案3：Maximize=FALSE用于自定义优化功能（比如自定义merror类型的东西）。您总是希望最大化/增加AUC，因此Maximize=TRUE更适合您。
答案4：它主要基于位置。首先训练部分。接下来应该进入验证/评估。

xgboost R包early_stop_rounds不会触发

1 个答案: