Question

我正在训练一个数据集，然后在其他一些数据集上对其进行测试。

为了提高性能，我想通过5倍交叉验证对参数进行微调。

但是，我认为我编写的代码不正确，因为当我尝试将模型拟合到我的测试集中时，它说还不适合。尽管交叉验证部分拟合了模型，但我仍然可以吗？还是我必须提取它？

这是我的代码：

svm = SVC(kernel='rbf', probability=True, random_state=42)

accuracies = cross_val_score(svm, data_train, lbs_train, cv=5)

pred_test = svm.predict(data_test)
accuracy = accuracy_score(lbs_test, pred_test)

Answer 1

是的，cross_validate_score没有返回拟合的模型。在您的示例中，您有cv=5，这意味着该模型适合5次。那么，您想要哪个？最后一个？

函数cross_val_score是sklearn.model_selection.cross_validate的简单版本。不仅返回分数，还提供更多信息。

因此您可以执行以下操作：

from sklearn.model_selection import cross_validate

svm = SVC(kernel='rbf', probability=True, random_state=42)

cv_results = cross_validate(svm, data_train, lbs_train, cv=5, return_estimator=True)
# cv_results is a dict with the following keys:
# 'test_score' which is what cross_val_score returns
# 'train_score'
# 'fit_time'
# 'score_time'
# 'estimator' which is a tuple of size cv and only if return_estimator=True

accuracies = cv_results['test_score'] # what you had before

svms = cv_results['estimator']
print(len(svms)) # 5

svm = svms[-1] # the last fitted svm, or pick any that you want

pred_test = svm.predict(data_test)
accuracy = accuracy_score(lbs_test, pred_test)

注意，这里您需要选择5个已安装的SVM中的一个。理想情况下，您将使用交叉验证来测试模型的性能。因此，您无需在最后再做一次。然后，您将再一次适合模型，但是这次可以使用所有数据，这些数据将成为您实际在生产中使用的模型。

另一个注意事项，您提到要对模型的参数进行微调。也许您应该看一下超参数优化。例如：https://datascience.stackexchange.com/a/36087/54395在这里，您将看到如何使用交叉验证并定义参数搜索空间。

模型拟合和交叉验证

1 个答案: