我无法理解xgb.cv的输出: 1)是1倍或最佳k倍的结果? 2)以及在训练集和测试集上将数据集拆分成什么原理-KFold或0.8 / 0.2?
运行代码时,我看到了计算过程。在获得最佳比分的较早停止回合后停止。
当然:#个模型参数
num_parallel_tree = 1
subsample = 1
colsample_bytree = 0.4
objective = 'binary:logistic'
learning_rate = 0.05
eval_metric = 'auc'
max_depth = 10
min_child_weight = 4
n_estimators = 5000
seed = 7
#cross-validation parameters
nfold = 5
early_stopping_rounds = 5
bst_cv = xgb.cv(
param,
dtrain,
num_boost_round=n_estimators,
nfold = nfold,
early_stopping_rounds=early_stopping_rounds,
verbose_eval=True
)
results:
[0] train-auc:0.910342+0.0015485 test-auc:0.850442+0.00619299
[1] train-auc:0.956268+0.00132653 test-auc:0.893746+0.00973467
...
[24] train-auc:0.984302+0.000617268 test-auc:0.934326+0.00338043
然后-停止。