我是Python的新手,请帮忙。 我在“测试数据集(占数据集的60%)”上应用了交叉验证,现在我试图找到如何在其余数据集(测试数据集-40%)上测试分类器。 我使用了以下代码:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('NvDataSet.csv', sep=';')
dataset = dataset.dropna()
print(dataset.info())
#dataset = pd.read_csv('Urban1.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:,76].values
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.4, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Fitting SVM to the Training set
from sklearn.svm import SVC
classifier = SVC(kernel = 'rbf', random_state = 0)
classifier.fit(X_train, y_train)
# Predicting the Test set results
y_pred = classifier.predict(X_test)
#Making the accuracy
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, y_pred))
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
在进行交叉验证之前,我观察到了结果:“准确性:95,97%”,然后在我的测试数据集上应用了交叉验证功能。
from sklearn.model_selection import cross_val_score
accuracies= cross_val_score(estimator=classifier, X= X_train, y= y_train, cv= 10)
accuracies.mean()
“交叉验证的平均准确度为93.58%”
现在我该怎么做,才能将使用交叉验证技术测试过的分类器的测试数据集:X_test和y_test?!
y_pred = classifier.predict(X_test)
在进行交叉验证之前,其结果相同,准确度= 95.97%,没有变化吗?