为什么XGBOOST产生非常高的RMSE?

时间:2019-12-03 19:45:16

标签: xgboost

XgBoost在训练过程中生成以下输出...

Will train until train-mae hasn't improved in 100 rounds.
[1] eval-rmse:0.572264  eval-mae:0.503361   train-rmse:0.581378 train-mae:0.505127
[2] eval-rmse:0.562002  eval-mae:0.493736   train-rmse:0.571314 train-mae:0.49589
[3] eval-rmse:0.552157  eval-mae:0.484482   train-rmse:0.561579 train-mae:0.486947
[4] eval-rmse:0.542568  eval-mae:0.475453   train-rmse:0.552166 train-mae:0.478281
[5] eval-rmse:0.533304  eval-mae:0.466712   train-rmse:0.543066 train-mae:0.4699
[6] eval-rmse:0.524383  eval-mae:0.458281   train-rmse:0.534271 train-mae:0.461785
[7] eval-rmse:0.515713  eval-mae:0.450087   train-rmse:0.525774 train-mae:0.453933
[8] eval-rmse:0.507356  eval-mae:0.442169   train-rmse:0.517533 train-mae:0.446304
[9] eval-rmse:0.499253  eval-mae:0.434472   train-rmse:0.509606 train-mae:0.438946
[0] eval-rmse:0.491482  eval-mae:0.427049   train-rmse:0.501954 train-mae:0.431819

输出表明,随着我们添加更多树,样本RMSE会减少。该模型正在学习数据中的模式。但是,我根本无法复制RMSE。使用以下公式,我计算出的MSE大约降低40%,并且随着我使用更多的树而不会降低。 (如果您相信我的MSE,则该模型不是正在学习,但是Xgboost表示正在学习。)请注意,比较完全是在验证数据集上完成的。

pred_val = bst.predict(dval, ntree_limit=20))
mse = np.var(pred_val - tgt_val)
我在训练中使用的参数...
param = {'max_depth':4,                  # the maximum depth of each tree
         'eta':0.025,                     # the training step for each iteration
         'objective':'reg:linear', 
         'gamma': 1,                     # penalization term for number of leaves
         'lambda': 1,                    # penalization for value of each leaf
         'tree_method':'approx', 
         'eval_metric':metrics_list, 
         'silent':True, 
         'verbose_eval':None,
         'subsample':1,                # Subsample ratio of the training instances. Subsampling will occur once in every boosting iteration.
         'colsample_bylevel':1         # Subsample ratio of columns for each level.
        }

0 个答案:

没有答案