XGBoost gridsearch scoring ='roc_auc'和roc_auc_score的不同roc_auc?

时间:2018-08-03 10:54:10

标签: python machine-learning scikit-learn xgboost grid-search

我已经使用GridSearch来解决分类问题:

# A parameter grid for XGBoost
params = {
        'min_child_weight': [1, 5, 10],
        'gamma': [0.5, 1, 1.5, 2, 5],
        'subsample': [0.6, 0.8, 1.0],
        'colsample_bytree': [0.6, 0.8, 1.0],
        'max_depth': [3, 4, 5]
        }
# fit model no training data

xgb = XGBClassifier(learning_rate=0.02, n_estimators=600, 
objective='binary:logistic',
                silent=True, nthread=1)
folds = 3
param_comb = 5

skf = StratifiedKFold(n_splits=folds, shuffle = True, random_state = 1001)

random_search = RandomizedSearchCV(xgb, param_distributions=params, 
                                   n_iter=param_comb, scoring='roc_auc', 
                                   n_jobs=4,

cv=skf.split(X_train_resampled,y_train_resampled), verbose=3, 
                                   random_state=1001 )
random_search.fit(X_train_resampled, y_train_resampled)

print('\n Best hyperparameters:')
print(random_search.best_params_)
print('\n Best estimator:')
print(random_search.best_estimator_)

之后,我得到了:

  

最佳超参数:{'subsample':0.6,'min_child_weight':1,   'max_depth':5,'gamma':1.5,'colsample_bytree':0.8}

     

最佳估算器:XGBClassifier(base_score = 0.5,booster ='gbtree',   colsample_bylevel = 1,          colsample_bytree = 0.8,gamma = 1.5,learning_rate = 0.02,          max_delta_step = 0,max_depth = 5,min_child_weight = 1,missing = None,          n_estimators = 600,n_jobs = 1,nthread = 1,objective ='binary:logistic',          random_state = 0,reg_alpha = 0,reg_lambda = 1,scale_pos_weight = 1,          seed = None,silent = True,subsample = 0.6)

最佳ROC AUC得分= 0.9719630276538562。比我运行过分类器:

model=XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=0.8, gamma=1.5, learning_rate=0.02,
       max_delta_step=0, max_depth=5, min_child_weight=1, missing=None,
       n_estimators=600, n_jobs=4, objective='binary:logistic',
       random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
       seed=None, silent=True, subsample=0.6)

model.fit(X_train_resampled, y_train_resampled)
# make predictions for test data    
predictions = model.predict_proba(X_test_scaled)[:, 1]
# evaluate predictions
print ('ROC AUC Score',roc_auc_score(y_test,predictions))

我已经阅读了最近的主题(What is the difference between cross_val_score with scoring='roc_auc' and roc_auc_score?),但问题仍然存在。我使用了predict_proba并获得了ROC AUC分数0.791423604769。

为什么有这种区别?有什么建议吗?在开始分类器之前,我正在进行缩放和重采样,但是具有固定的随机状态-与gridsearch相同。

0 个答案:

没有答案