我正在尝试找到具有多项式内核的“最佳” SVM,因为据我了解,任务是找到超参数的最佳集合。我正在尝试使用嵌套交叉验证以最小化偏差。但是,我没有得到的是,如果对我的外部交叉验证(例如10倍),对于每个拆分,我得到不同的最佳超参数集,那么我该如何选择最佳的总体参数集?我的最终目标是使用一组超参数来报告模型,以使准确性最大化。
cv_outer = StratifiedKFold(n_splits=3, shuffle=True, random_state=41)
outer_results = list()
for train_index, test_index in cv_outer.split(X, y):
# split data
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
y_train = y_train.ravel()
# configure the cross-validation procedure
cv_inner = StratifiedKFold(n_splits=2, shuffle=True, random_state=41)
# define the model
model = SVC(kernel='poly')
# define search space
space = dict()
space['C'] = [0.1, 1, 10, 100]
space['degree'] = [2, 4]
# define search
search = GridSearchCV(model, space, scoring='accuracy', cv=cv_inner, refit=True)
# execute search
result = search.fit(X_train, y_train)
# get the best performing model fit on the whole training set
best_model = result.best_estimator_
# evaluate model on the hold out dataset
yhat = best_model.predict(X_test)
# evaluate the model
acc = accuracy_score(y_test, yhat)
# store the result
outer_results.append(acc)
# report progress
print('>acc=%.3f, est=%.3f, cfg=%s' % (acc, result.best_score_, result.best_params_))
# summarize the estimated performance of the model
print('Accuracy: %.3f (%.3f)' % (mean(outer_results), std(outer_results)))
print(best_model)`