Question

我正在使用skilearn进行SVM培训。我正在使用交叉验证来评估估算器并避免过度拟合模型。

我将数据分成两部分。训练数据和测试数据。这是代码：

import numpy as np
from sklearn import cross_validation
from sklearn import datasets
from sklearn import svm
X_train, X_test, y_train, y_test = cross_validation.train_test_split(
    iris.data, iris.target, test_size=0.4, random_state=0
)
clf = svm.SVC(kernel='linear', C=1)
scores = cross_validation.cross_val_score(clf, X_train, y_train, cv=5)
print scores

# Now I need to evaluate the estimator *clf* on X_test.
clf.score(X_test,y_test)
# here,  I get an error say that the model is not fitted using fit(), but normally,
# in cross_val_score function the model is fitted? What is the problem?

Answer 1

cross_val_score基本上是sklearn cross-validation iterators的便利包装器。你给它一个分类器和你的整个（训练+验证）数据集，它会自动执行一轮或多轮交叉验证，将你的数据分成随机训练/验证集，拟合训练集，并计算验证集上的分数。有关示例和更多说明，请参阅文档here。

clf.score(X_test, y_test)引发异常的原因是cross_val_score在估算工具的副本上执行拟合，而不是原始版本（请参阅{{1}的使用在源代码here中）。因此，clone(estimator)在函数调用之外保持不变，因此在调用clf时未正确初始化。

交叉验证和模型选择

1 个答案: