我已经训练了一个SVM模型,并尝试创建一个混淆矩阵来对其进行评估。 因此,我将对测试数据进行预测,并将预测结果与测试数据的目标类别进行比较。
我大约有1000条数据记录,而Test数据则是近300条数据记录。 我定义了九个类/标签。
特性从-1标准化为1,并且均为浮点型。 数组A的一行代表每个数据记录,目标类存储在数组B中。 我将这些阵列按70:30的比例分为训练数据和测试数据。
这是一个简单的代码,但是我现在没有任何想法。 一种可能是对测试数据的每个数据记录进行预测和混淆矩阵,然后将结果存储在列表中。遍历所有数据记录后,我可以建立所有存储元素的均值吗?
有人知道如何解决我的问题吗?
# -*- coding: utf-8 -*-
"""
Created on Fri Apr 5 10:50:47 2019
@author: mattdoe
"""
from data_preprocessor_db import data_storage # validation data
from sklearn.preprocessing import MinMaxScaler
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from numpy import array
import pickle
# for seperation of data_storage
# Link_ID = list()
Input, Output = list(), list()
# seperate data_storage in Input and Output data
for items in data_storage:
# Link_ID = items[0] # identifier not needed
Input.append((float(items[1]), float(items[2]), float(items[3]), float(items[4]), float(items[5]), float(items[6]), float(items[7]), float(items[8]), float(items[9]))) # Input: all characteristics
Output.append(float(items[10])) # Output: scenario_class 1 to 9
# Input tuple to array
A = array(Input)
# normalise array between 0 and 1
scaler = MinMaxScaler(feature_range=(-1, 1))
scaledA = scaler.fit_transform(A)
# Output tuple to array
B = array(Output)
# split train and test data; ration: 70:30
# shuffle = False: doesn't sort data randomly
# shuffle = True: default: sorts data randomly
A_train, A_test, B_train, B_test = train_test_split(A, B, test_size=0.3, shuffle=True, random_state=40)
# create model
model = svm.SVC(kernel='linear', C = 1.0)
# fit model
model.fit(A_train, B_train)
# get support vectors
# model.support_vectors_
# get indices of support vectors
# model.support_
# get number of support vectors for each class
# model.n_support_
filename = 'ml_svm.sav'
pickle.dump(model, open(filename, 'wb'))
# load the model from disk
loaded_model = pickle.load(open(filename, 'rb'))
# test to all data records
# result = loaded_model.score(A, B)
# test with test data
# score represents the mean accuracy of given test data and labels
result = loaded_model.score(A_test, B_test) # relative
print(result)
# confusion matrix compares true value with predicted value
# true value <--> predicted value
predicted = model.predict(A_test)
tn, fp, fn, tp = confusion_matrix(B_test, predicted, labels=[1, 2, 3, 4, 5, 6, 7, 8, 9]).ravel()
我的错误:
Traceback (most recent call last):
File "<ipython-input-8-8649dd873bbd>", line 1, in <module>
runfile('C:/Workspace/Master-Thesis/Programm/MapValidationML/ml_svm.py', wdir='C:/Workspace/Master-Thesis/Programm/MapValidationML')
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Workspace/Master-Thesis/Programm/MapValidationML/ml_svm.py", line 75, in <module>
tn, fp, fn, tp = confusion_matrix(B_test, predicted, labels=[1, 2, 3, 4, 5, 6, 7, 8, 9]).ravel()
ValueError: too many values to unpack (expected 4)
答案 0 :(得分:0)
感谢elgordorafiki。 使用confusion_vector = confusion_matrix(...)的解决方案效果很好。
没有.ravel(),我现在收到一个9x9矩阵。
对角线上的结果是否全部正确,对角线所有不正确的值?那么每一列每一行代表一个类吗?哪些是预科课程?列还是行?
我如何理解结果?
我的结果如下:
[[ 35 1 0 0 0 0 0 0 0]
[ 0 177 0 0 0 0 0 0 0]
[ 3 2 0 0 0 0 0 0 0]
[ 2 3 0 0 0 0 0 0 0]
[ 0 0 0 0 5 0 0 0 0]
[ 0 0 0 0 0 8 0 0 0]
[ 0 0 0 0 0 0 3 0 0]
[ 0 0 0 0 0 0 0 7 0]
[ 4 6 0 0 1 1 1 0 14]]
就我而言,第3和第4类似乎与第1和2类有关。