使用xgboost进行训练时的有效性得分低于最终模型

时间:2019-02-18 15:12:10

标签: validation machine-learning xgboost

我有3个3类,但是我的指标是auc,所以我有客户评估指标:

# while training eval metric
def custom_eval_metric_class(preds, dtrain):
    labels = dtrain.get_label()
    labels_processed = [1 if u == 2 else 0 for u in labels]
    pred_proba = preds[:, 2]
    return 'auc', roc_auc_score(labels_processed, pred_proba)
#final metric function 
def roc_auc_score_3class(y_test, y_score):    
    #print (y_test, '\n', y_score)
    metric_auc = roc_auc_score( [1 if v == 2 else 0 for v in y_test], 
                                 y_score)
    return metric_auc

我的目标课是2。

虽然培训模型在验证时的性能达到.80+,但在最终验证时却没有达到这一价值。这里可能出什么问题了?

def train_model(X, y, params=None, folds=5, model_type='lgb'):
    for fold_n, (train_index, valid_index) in enumerate(folds.split(X, y)):
        gc.collect()
        print('Fold', fold_n + 1, 'started at', time.ctime())
        X_train, X_valid = X.iloc[train_index], X.iloc[valid_index]
        y_train, y_valid = y.iloc[train_index], y.iloc[valid_index]


        if model_type == 'xgb':
            model = xgb.XGBClassifier(params=params,  n_estimators = 5000)
            model = model.fit(X_train, y_train, eval_set = [(X_valid, y_valid)], early_stopping_rounds=200,
                             eval_metric = custom_eval_metric_class, verbose = 100)
            y_pred_valid = model.predict_proba(X_valid, model.best_ntree_limit)[:, 2] 

        scores.append(roc_auc_score_3class(y_valid, y_pred_valid))
        print('Fold valid roc_auc:', roc_auc_score_3class(y_valid, y_pred_valid))

Will train until validation_0-auc hasn't improved in 200 rounds.
[100]  validation_0-merror:0.211905 validation_0-auc:0.790956
[200]  validation_0-merror:0.214286 validation_0-auc:0.794158
[300]  validation_0-merror:0.210714 validation_0-auc:0.792962
Stopping. Best iteration:
[196]  validation_0-merror:0.214286 validation_0-auc:0.796363

Fold valid roc_auc: 0.731813592646

0 个答案:

没有答案