Question

我是Python的新手，请帮忙。我在“测试数据集（占数据集的60％）”上应用了交叉验证，现在我试图找到如何在其余数据集（测试数据集-40％）上测试分类器。我使用了以下代码：

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('NvDataSet.csv', sep=';')
dataset = dataset.dropna()
print(dataset.info())
#dataset = pd.read_csv('Urban1.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:,76].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.4, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Fitting SVM to the Training set
from sklearn.svm import SVC
classifier = SVC(kernel = 'rbf', random_state = 0)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

#Making the accuracy
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, y_pred))

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

在进行交叉验证之前，我观察到了结果：“准确性：95,97％”，然后在我的测试数据集上应用了交叉验证功能。

from sklearn.model_selection import cross_val_score
accuracies= cross_val_score(estimator=classifier, X= X_train, y= y_train, cv= 10)
accuracies.mean()

“交叉验证的平均准确度为93.58％”

现在我该怎么做，才能将使用交叉验证技术测试过的分类器的测试数据集：X_test和y_test？！

y_pred = classifier.predict(X_test)

在进行交叉验证之前，其结果相同，准确度= 95.97％，没有变化吗？

python-交叉验证后如何使用“测试”数据集？

0 个答案: