XGBClasifier和GridSearchCV / cross_val_score:得分为“ neg_log_loss”的问题

时间:2020-10-02 17:55:40

标签: python scikit-learn xgboost gridsearchcv

我正在XGBClassifier上执行GridSearchCV,并且要尽早停止,我想将“ neg_log_loss”用作评分函数。如果我运行以下代码:

from xgboost import XGBClassifier
xgb_clsf = XGBClassifier()

X_train, X_val, y_train, y_val = train_test_split(dataset_prepared_stand, dataset_labels, random_state=42)

param_grid = {
    'n_estimators': [2000],
    'learning_rate': [0.05, 0.5, 1.],
    'max_depth': [5, 10, 20],
    }

fit_params={"early_stopping_rounds": 250, 
            "eval_metric": "logloss", 
            "eval_set": [[X_val, y_val]],
            "verbose": 0}

scores = ['neg_log_loss', 'roc_auc', 'accuracy', 'f1']

grid_search_xgb_clsf = GridSearchCV(xgb_clsf, param_grid, cv=KFold(n_splits=3, random_state=42, shuffle=True),
                                    scoring=scores, refit='neg_log_loss',
                                    return_train_score=True, verbose=100)

grid_search_xgb_clsf.fit(X_train, y_train, **fit_params)

我收到以下错误:

 RuntimeWarning: divide by zero encountered in logloss = -(transformed_labels * np.log(y_pred)).sum(axis=1)
 RuntimeWarning: invalid value encountered in multiply loss = -(transformed_labels * np.log(y_pred)).sum(axis=1)

为了避免发生此错误,我尝试用scores变量代替度量log_loss来定义负对数丢失,并且该变量有效(未显示错误):

def _score_func(estimator, X, y):
    score = log_loss(y, estimator.predict_proba(X))
    return -score
scores = {'neg_log_loss': _score_func, 'roc_auc': make_scorer(roc_auc_score), 
          'accuracy': make_scorer(accuracy_score), 'f1': make_scorer(f1_score)}

这是一个错误还是我做错了?我想保持相同的行为,以避免为XGBClassifier定义与其他模型不同的函数。 如果我只是在评分中使用“ neg_log_loss”运行cross_validate,就会发生同样的情况。

0 个答案:

没有答案