python-交叉验证后如何使用“测试”数据集?

时间:2020-01-27 11:51:32

标签: python machine-learning

我是Python的新手,请帮忙。 我在“测试数据集(占数据集的60%)”上应用了交叉验证,现在我试图找到如何在其余数据集(测试数据集-40%)上测试分类器。 我使用了以下代码:

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('NvDataSet.csv', sep=';')
dataset = dataset.dropna()
print(dataset.info())
#dataset = pd.read_csv('Urban1.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:,76].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.4, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Fitting SVM to the Training set
from sklearn.svm import SVC
classifier = SVC(kernel = 'rbf', random_state = 0)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

#Making the accuracy
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, y_pred))

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

在进行交叉验证之前,我观察到了结果:“准确性:95,97%”,然后在我的测试数据集上应用了交叉验证功能。

from sklearn.model_selection import cross_val_score
accuracies= cross_val_score(estimator=classifier, X= X_train, y= y_train, cv= 10)
accuracies.mean()

“交叉验证的平均准确度为93.58%”

现在我该怎么做,才能将使用交叉验证技术测试过的分类器的测试数据集:X_test和y_test?!

y_pred = classifier.predict(X_test)

在进行交叉验证之前,其结果相同,准确度= 95.97%,没有变化吗?

0 个答案:

没有答案