Question

我正在尝试找到具有多项式内核的“最佳” SVM，因为据我了解，任务是找到超参数的最佳集合。我正在尝试使用嵌套交叉验证以最小化偏差。但是，我没有得到的是，如果对我的外部交叉验证（例如10倍），对于每个拆分，我得到不同的最佳超参数集，那么我该如何选择最佳的总体参数集？我的最终目标是使用一组超参数来报告模型，以使准确性最大化。

cv_outer = StratifiedKFold(n_splits=3, shuffle=True, random_state=41)
outer_results = list()

for train_index, test_index in cv_outer.split(X, y):
    # split data
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    y_train = y_train.ravel()
    # configure the cross-validation procedure
    cv_inner = StratifiedKFold(n_splits=2, shuffle=True, random_state=41)
    # define the model
    model = SVC(kernel='poly')
    # define search space
    space = dict()
    space['C'] = [0.1, 1, 10, 100]
    space['degree'] = [2, 4]
    # define search
    search = GridSearchCV(model, space, scoring='accuracy', cv=cv_inner, refit=True)
    # execute search
    result = search.fit(X_train, y_train)
    # get the best performing model fit on the whole training set
    best_model = result.best_estimator_
    # evaluate model on the hold out dataset
    yhat = best_model.predict(X_test)
    # evaluate the model
    acc = accuracy_score(y_test, yhat)
    # store the result
    outer_results.append(acc)
    # report progress
    print('>acc=%.3f, est=%.3f, cfg=%s' % (acc, result.best_score_, result.best_params_))
# summarize the estimated performance of the model
print('Accuracy: %.3f (%.3f)' % (mean(outer_results), std(outer_results)))
print(best_model)`

SVM模型的超参数调整和嵌套交叉验证

0 个答案: