交叉验证Python Sklearn

时间:2018-03-08 06:51:41

标签: python scikit-learn classification svm cross-validation

我希望在我的SVM分类器上进行交叉验证,然后在实际测试集上使用它。我想问的是,我是对原始数据集还是训练集进行交叉验证,这是train_test_split()函数的结果?

import pandas as pd
from sklearn.model_selection import KFold,train_test_split,cross_val_score
from sklearn.svm import SVC

df = pd.read_csv('dataset.csv', header=None)
X = df[:,0:10]
y = df[:,10]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=40)

kfold = KFold(n_splits=10, random_state=seed)

svm = SVC(kernel='poly')
results = cross_val_score(svm, X, y, cv=kfold) #Cross validation on original set

import pandas as pd
from sklearn.model_selection import KFold,train_test_split,cross_val_score
from sklearn.svm import SVC

df = pd.read_csv('dataset.csv', header=None)
X = df[:,0:10]
y = df[:,10]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=40)

kfold = KFold(n_splits=10, random_state=seed)

svm = SVC(kernel='poly')
results = cross_val_score(svm, X_train, y_train, cv=kfold) #Cross validation on training set

1 个答案:

答案 0 :(得分:3)

最好始终保留一个仅在您对模型满意后才使用的测试集,在部署之前。那么你的火车/测试分裂,然后设置测试。我们不会碰这个。

仅在训练集上执行交叉验证。对于每个k折叠,您将使用训练集的一部分进行训练,其余部分作为验证集。一旦您对您的模型和您选择的超参数感到满意。然后使用测试集来获得最终基准。

你的第二段代码是正确的。