XGBoost提前停止会产生KeyError:'best_msg'

时间:2018-04-04 11:21:10

标签: python scikit-learn xgboost

我正在尝试使用XGBoost scikit包装器及早停止回归问题。奇怪的是,早期停止eval_metric(在我的情况下,rmse)的计算在每个early stopping轮次失败。这很奇怪,因为相同的估算工作对eval_set没有early stopping有效。

以下是代码:

eval_train_indices=y.dropna()[:-n_splits].index
eval_test_indices=y.dropna()[-n_splits:].index

X_train, X_test=X.loc[eval_train_indices,:], X.loc[eval_test_indices,:]
y_train, y_test = y.loc[eval_train_indices], y.loc[eval_test_indices]

eval_set = [(X_train, y_train), (X_test, y_test)]

predictor=XGBRegressor(n_estimators = 50000, subsample=0.8, **{params})

predictor.fit(X, y,
                  eval_metric=["rmse"], 
                  eval_set=eval_set, 
                  early_stopping_rounds=40,
                  verbose=True)

它产生的错误信息:

    <ipython-input-65-358402bfa21c> in fit(self, T)
    147                   early_stopping_rounds=40,
    148                   verbose=True)
    150 
    151         n_estimators=int(self.predictor.best_iteration*1.0)

/Users/Nicolas/anaconda2/lib/python2.7/site-packages/xgboost-0.7-py2.7.egg/xgboost/sklearn.pyc in fit(self, X, y, sample_weight, eval_set, eval_metric, early_stopping_rounds, verbose, xgb_model)
    291                               early_stopping_rounds=early_stopping_rounds,
    292                               evals_result=evals_result, obj=obj, feval=feval,
--> 293                               verbose_eval=verbose, xgb_model=xgb_model)
    294 
    295         if evals_result:

/Users/Nicolas/anaconda2/lib/python2.7/site-packages/xgboost-0.7-py2.7.egg/xgboost/training.pyc in train(params, dtrain, num_boost_round, evals, obj, feval, maximize, early_stopping_rounds, evals_result, verbose_eval, xgb_model, callbacks, learning_rates)
    202                            evals=evals,
    203                            obj=obj, feval=feval,
--> 204                            xgb_model=xgb_model, callbacks=callbacks)
    205 
    206 

/Users/Nicolas/anaconda2/lib/python2.7/site-packages/xgboost-0.7-py2.7.egg/xgboost/training.pyc in _train_internal(params, dtrain, num_boost_round, evals, obj, feval, xgb_model, callbacks)
     97                                end_iteration=num_boost_round,
     98                                rank=rank,
---> 99                                evaluation_result_list=evaluation_result_list))
    100         except EarlyStopException:
    101             break

/Users/Nicolas/anaconda2/lib/python2.7/site-packages/xgboost-0.7-py2.7.egg/xgboost/callback.pyc in callback(env)
    245                                    best_msg=state['best_msg'])
    246         elif env.iteration - best_iteration >= stopping_rounds:
--> 247             best_msg = state['best_msg']
    248             if verbose and env.rank == 0:
    249                 msg = "Stopping. Best iteration:\n{}\n\n"

KeyError: 'best_msg'

出于某种原因,XGB似乎无法在早期停止轮次期间计算RMSE,尽管它在没有early stopping的eval列车和测试集上进行测试时确实有效。 verbose=True时,显示以下内容:

[0] validation_0-rmse:nan   validation_1-rmse:nan
Multiple eval metrics have been passed: 'validation_1-rmse' will be used for early stopping.

Will train until validation_1-rmse hasn't improved in 40 rounds.
[1] validation_0-rmse:nan   validation_1-rmse:nan
[2] validation_0-rmse:nan   validation_1-rmse:nan
[3] validation_0-rmse:nan   validation_1-rmse:nan
[4] validation_0-rmse:nan   validation_1-rmse:nan
[5] validation_0-rmse:nan   validation_1-rmse:nan
[6] validation_0-rmse:nan   validation_1-rmse:nan
[7] validation_0-rmse:nan   validation_1-rmse:nan
[8] validation_0-rmse:nan   validation_1-rmse:nan
[9] validation_0-rmse:nan   validation_1-rmse:nan
[10]    validation_0-rmse:nan   validation_1-rmse:nan
[11]    validation_0-rmse:nan   validation_1-rmse:nan
[12]    validation_0-rmse:nan   validation_1-rmse:nan
[13]    validation_0-rmse:nan   validation_1-rmse:nan
[14]    validation_0-rmse:nan   validation_1-rmse:nan
[15]    validation_0-rmse:nan   validation_1-rmse:nan
[16]    validation_0-rmse:nan   validation_1-rmse:nan
[17]    validation_0-rmse:nan   validation_1-rmse:nan
[18]    validation_0-rmse:nan   validation_1-rmse:nan
[19]    validation_0-rmse:nan   validation_1-rmse:nan
[20]    validation_0-rmse:nan   validation_1-rmse:nan
[21]    validation_0-rmse:nan   validation_1-rmse:nan
[22]    validation_0-rmse:nan   validation_1-rmse:nan
[23]    validation_0-rmse:nan   validation_1-rmse:nan
[24]    validation_0-rmse:nan   validation_1-rmse:nan
[25]    validation_0-rmse:nan   validation_1-rmse:nan
[26]    validation_0-rmse:nan   validation_1-rmse:nan
[27]    validation_0-rmse:nan   validation_1-rmse:nan
[28]    validation_0-rmse:nan   validation_1-rmse:nan
[29]    validation_0-rmse:nan   validation_1-rmse:nan
[30]    validation_0-rmse:nan   validation_1-rmse:nan
[31]    validation_0-rmse:nan   validation_1-rmse:nan
[32]    validation_0-rmse:nan   validation_1-rmse:nan
[33]    validation_0-rmse:nan   validation_1-rmse:nan
[34]    validation_0-rmse:nan   validation_1-rmse:nan
[35]    validation_0-rmse:nan   validation_1-rmse:nan
[36]    validation_0-rmse:nan   validation_1-rmse:nan
[37]    validation_0-rmse:nan   validation_1-rmse:nan
[38]    validation_0-rmse:nan   validation_1-rmse:nan
[39]    validation_0-rmse:nan   validation_1-rmse:nan
[40]    validation_0-rmse:nan   validation_1-rmse:nan

我甚至不明白什么可能导致计算RMSE失败。这可能是由于缺少值,但我没有打印predictor.predict(X_test)

2 个答案:

答案 0 :(得分:1)

这是由于 Nan 值所致;尝试删除或替换它们,并检查其是否有效。

答案 1 :(得分:0)

仅在升级到xgboost = 0.80以使用SHAP模块后,我才遇到此问题。 xgboost = 0.6a1的早期版本运行良好。