XgBoost在训练过程中生成以下输出...
Will train until train-mae hasn't improved in 100 rounds.
[1] eval-rmse:0.572264 eval-mae:0.503361 train-rmse:0.581378 train-mae:0.505127
[2] eval-rmse:0.562002 eval-mae:0.493736 train-rmse:0.571314 train-mae:0.49589
[3] eval-rmse:0.552157 eval-mae:0.484482 train-rmse:0.561579 train-mae:0.486947
[4] eval-rmse:0.542568 eval-mae:0.475453 train-rmse:0.552166 train-mae:0.478281
[5] eval-rmse:0.533304 eval-mae:0.466712 train-rmse:0.543066 train-mae:0.4699
[6] eval-rmse:0.524383 eval-mae:0.458281 train-rmse:0.534271 train-mae:0.461785
[7] eval-rmse:0.515713 eval-mae:0.450087 train-rmse:0.525774 train-mae:0.453933
[8] eval-rmse:0.507356 eval-mae:0.442169 train-rmse:0.517533 train-mae:0.446304
[9] eval-rmse:0.499253 eval-mae:0.434472 train-rmse:0.509606 train-mae:0.438946
[0] eval-rmse:0.491482 eval-mae:0.427049 train-rmse:0.501954 train-mae:0.431819
输出表明,随着我们添加更多树,样本RMSE会减少。该模型正在学习数据中的模式。但是,我根本无法复制RMSE。使用以下公式,我计算出的MSE大约降低40%,并且随着我使用更多的树而不会降低。 (如果您相信我的MSE,则该模型不是正在学习,但是Xgboost表示正在学习。)请注意,比较完全是在验证数据集上完成的。
pred_val = bst.predict(dval, ntree_limit=20))
mse = np.var(pred_val - tgt_val)
我在训练中使用的参数...
param = {'max_depth':4, # the maximum depth of each tree
'eta':0.025, # the training step for each iteration
'objective':'reg:linear',
'gamma': 1, # penalization term for number of leaves
'lambda': 1, # penalization for value of each leaf
'tree_method':'approx',
'eval_metric':metrics_list,
'silent':True,
'verbose_eval':None,
'subsample':1, # Subsample ratio of the training instances. Subsampling will occur once in every boosting iteration.
'colsample_bylevel':1 # Subsample ratio of columns for each level.
}