Question

我正在使用以下代码对包含380,000行的数据集上的XGBOOST分类器进行交叉验证。

newxgb = XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1.0, gamma=7,
              learning_rate=0.1, max_delta_step=0, max_depth=6,
              min_child_weight=10, missing=None, n_estimators=750, n_jobs=-1,
              nthread=1, objective='binary:logistic', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=True, subsample=0.8, verbosity=1)


from sklearn.model_selection import cross_val_score
scores = cross_val_score(newxgb, 
users.drop(['campaign','demo_localtime','y'],axis=1), users.y, cv=10,scoring='roc_auc')

我得到以下结果：

print(scores)
0.602625
0.410916
0.30993
0.199518
0.188693
0.036554
0.020763
0.0842989
0.679293
0.658348

中间的结果显然是错误的（使用train-test样本进行验证时，我得到的AUC = 0.72）

2个问题： 1.什么原因造成的？ 2.我注意到在运行此CV代码时，我总是得到相同的模式，即降序然后升序。有什么理由吗？

什么可能导致交叉验证显示AUC分数存在较大差异

0 个答案: