使用RandomizedSearchCV对XGBClassifier进行Python超参数优化

时间:2017-05-12 00:53:30

标签: python classification bayesian xgboost

我正在尝试为XGBClassifier获得最佳的超参数,这将导致获得最具预测性的属性。我试图使用RandomizedSearchCV通过KFold进行迭代和验证。

当我运行此过程总共5次(numFolds = 5)时,我希望将最佳结果保存在名为collector(以下指定)的数据框中。因此,每次迭代,我都希望获得最佳结果并将分数附加到收集器数据帧。

 from scipy import stats
 from scipy.stats import randint
 from sklearn.model_selection import RandomizedSearchCV
 from sklearn.metrics import 
 precision_score,recall_score,accuracy_score,f1_score,roc_auc_score

clf_xgb = xgb.XGBClassifier(objective = 'binary:logistic')
param_dist = {'n_estimators': stats.randint(150, 1000),
              'learning_rate': stats.uniform(0.01, 0.6),
              'subsample': stats.uniform(0.3, 0.9),
              'max_depth': [3, 4, 5, 6, 7, 8, 9],
              'colsample_bytree': stats.uniform(0.5, 0.9),
              'min_child_weight': [1, 2, 3, 4]
             }
clf = RandomizedSearchCV(clf_xgb, param_distributions = param_dist, n_iter = 25, scoring = 'roc_auc', error_score = 0, verbose = 3, n_jobs = -1)

numFolds = 5
folds = cross_validation.KFold(n = len(X), shuffle = True, n_folds = numFolds)

collector = pd.DataFrame()
estimators = []
results = np.zeros(len(X))
score = 0.0

for train_index, test_index in folds:
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    clf.fit(X_train, y_train)
    estimators.append(clf.best_estimator_)
    estcoll = pd.DataFrame(estimators)


    estcoll['score'] = score
    pd.concat([collector,estcoll])
    print "\n", len(collector), "\n"
score /= numFolds

由于某些原因,没有任何内容保存到数据框中,请提供帮助。

此外,我有大约350个属性可循环使用,列车为3.5K,测试为2K。通过贝叶斯超参数优化过程运行这个可能会改善我的结果吗?或者它只会节省处理时间?

0 个答案:

没有答案