Question

我已经使用GridSearch来解决分类问题：

# A parameter grid for XGBoost
params = {
        'min_child_weight': [1, 5, 10],
        'gamma': [0.5, 1, 1.5, 2, 5],
        'subsample': [0.6, 0.8, 1.0],
        'colsample_bytree': [0.6, 0.8, 1.0],
        'max_depth': [3, 4, 5]
        }
# fit model no training data

xgb = XGBClassifier(learning_rate=0.02, n_estimators=600, 
objective='binary:logistic',
                silent=True, nthread=1)
folds = 3
param_comb = 5

skf = StratifiedKFold(n_splits=folds, shuffle = True, random_state = 1001)

random_search = RandomizedSearchCV(xgb, param_distributions=params, 
                                   n_iter=param_comb, scoring='roc_auc', 
                                   n_jobs=4,

cv=skf.split(X_train_resampled,y_train_resampled), verbose=3, 
                                   random_state=1001 )
random_search.fit(X_train_resampled, y_train_resampled)

print('\n Best hyperparameters:')
print(random_search.best_params_)
print('\n Best estimator:')
print(random_search.best_estimator_)

之后，我得到了：

最佳超参数：{'subsample'：0.6，'min_child_weight'：1，   'max_depth'：5，'gamma'：1.5，'colsample_bytree'：0.8}

最佳估算器：XGBClassifier（base_score = 0.5，booster ='gbtree'，   colsample_bylevel = 1，          colsample_bytree = 0.8，gamma = 1.5，learning_rate = 0.02，          max_delta_step = 0，max_depth = 5，min_child_weight = 1，missing = None，          n_estimators = 600，n_jobs = 1，nthread = 1，objective ='binary：logistic'，          random_state = 0，reg_alpha = 0，reg_lambda = 1，scale_pos_weight = 1，          seed = None，silent = True，subsample = 0.6）

最佳ROC AUC得分= 0.9719630276538562。比我运行过分类器：

model=XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=0.8, gamma=1.5, learning_rate=0.02,
       max_delta_step=0, max_depth=5, min_child_weight=1, missing=None,
       n_estimators=600, n_jobs=4, objective='binary:logistic',
       random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
       seed=None, silent=True, subsample=0.6)

model.fit(X_train_resampled, y_train_resampled)
# make predictions for test data    
predictions = model.predict_proba(X_test_scaled)[:, 1]
# evaluate predictions
print ('ROC AUC Score',roc_auc_score(y_test,predictions))

我已经阅读了最近的主题（What is the difference between cross_val_score with scoring='roc_auc' and roc_auc_score?），但问题仍然存在。我使用了predict_proba并获得了ROC AUC分数0.791423604769。

为什么有这种区别？有什么建议吗？在开始分类器之前，我正在进行缩放和重采样，但是具有固定的随机状态-与gridsearch相同。

XGBoost gridsearch scoring ='roc_auc'和roc_auc_score的不同roc_auc？

0 个答案: