我正在XGBClassifier上执行GridSearchCV,并且要尽早停止,我想将“ neg_log_loss”用作评分函数。如果我运行以下代码:
from xgboost import XGBClassifier
xgb_clsf = XGBClassifier()
X_train, X_val, y_train, y_val = train_test_split(dataset_prepared_stand, dataset_labels, random_state=42)
param_grid = {
'n_estimators': [2000],
'learning_rate': [0.05, 0.5, 1.],
'max_depth': [5, 10, 20],
}
fit_params={"early_stopping_rounds": 250,
"eval_metric": "logloss",
"eval_set": [[X_val, y_val]],
"verbose": 0}
scores = ['neg_log_loss', 'roc_auc', 'accuracy', 'f1']
grid_search_xgb_clsf = GridSearchCV(xgb_clsf, param_grid, cv=KFold(n_splits=3, random_state=42, shuffle=True),
scoring=scores, refit='neg_log_loss',
return_train_score=True, verbose=100)
grid_search_xgb_clsf.fit(X_train, y_train, **fit_params)
我收到以下错误:
RuntimeWarning: divide by zero encountered in logloss = -(transformed_labels * np.log(y_pred)).sum(axis=1)
RuntimeWarning: invalid value encountered in multiply loss = -(transformed_labels * np.log(y_pred)).sum(axis=1)
为了避免发生此错误,我尝试用scores
变量代替度量log_loss来定义负对数丢失,并且该变量有效(未显示错误):
def _score_func(estimator, X, y):
score = log_loss(y, estimator.predict_proba(X))
return -score
scores = {'neg_log_loss': _score_func, 'roc_auc': make_scorer(roc_auc_score),
'accuracy': make_scorer(accuracy_score), 'f1': make_scorer(f1_score)}
这是一个错误还是我做错了?我想保持相同的行为,以避免为XGBClassifier定义与其他模型不同的函数。 如果我只是在评分中使用“ neg_log_loss”运行cross_validate,就会发生同样的情况。