r-如何从数据集中创建混淆矩阵

时间:2018-10-04 10:15:18

标签: r

任何人都可以建议我,并帮助我为SVM模型创建混淆矩阵,因为出现以下错误:

"Error: 'data' and 'reference' should be factors with the same levels." 

来自下面的混淆矩阵代码...

confusionMatrix(predA, tmp_test$Score)

我也尝试过

confusionMatrix(table(predA, tmp_test)) 

然后我得到了以下错误...

"Error in table(predA, tmp_test) : all arguments must have the same length"

SVM模型是回归模型。

样品表...

Unhelpful Score 
7     1
8     3
5     1
7     2
4     1 
4     1
5     1
9     2
6     1
5     1
11    3

有2108个obs和2个变量。没有丢失或无效的数据或0(零)值。无用值的范围是4到2016。得分值的范围是1到3。

这是我的代码...

# Random sampling
samplesize = 0.60 * nrow(dsTemp)
set.seed(80)
index = sample(seq_len(nrow(dsTemp)), size = samplesize)

# Create training and test set
datatrain = dsTemp[ index, ]
datatest = dsTemp[ -index, ]


library(caret)
library(e1071)
library(tidyverse)

tmp_train <-datatrain
tmp_test <- datatest

#orginally datatypes were int but I had to change to factor for the model 
#to work
dsTemp$Score <- factor(dsTemp$Score)
dsTemp$Unhelpful <- factor(dsTemp$Unhelpful)

dsTemp$Unhelpful <- factor(dsTemp$Unhelpful)
dsTemp$Score <- factor(dsTemp$Score)

#svm model
Model <- svm(Score ~., data=tmp_train,kernel='linear',gamma=0.2,cost=100)

#predictions
predA <- predict(svmModel, tmp_test)

编辑

tmp_train$Score <- factor(tmp_train$Score)
tmp_test$Score <- factor(tmp_test$Score)

tmp_train$HelpfulnessDenominator <- factor(tmp_train$HelpfulnessDenominator)
tmp_test$HelpfulnessDenominator <- factor(tmp_test$HelpfulnessDenominator)

之后出错
confusionMatrix(predA, tmp_test) 

confusionMatrix(table(predA, tmp_test))  

str(predA)
 Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, "names")= chr [1:1264] "927" "1179" "1655" "156" …

str(tmp_test$Score)
Factor w/ 3 levels "1","2","3": 1 3 3 3 1 1 1 2 2 3 ...

1 个答案:

答案 0 :(得分:0)

好像您在训练和测试集中都没有更改为factors,而是在dsTemp中更改为:

dsTemp$Score <- factor(dsTemp$Score)
dsTemp$Unhelpful <- factor(dsTemp$Unhelpful)

dsTemp$Unhelpful <- factor(dsTemp$Unhelpful)
dsTemp$Score <- factor(dsTemp$Score) #also this is just a repetition

相反,它应该是:

tmp_train$Score <- factor(tmp_train$Score)
tmp_test$Score <- factor(tmp_test$Score)

因为这些是您稍后要调用的数据集:

#svm model
Model <- svm(Score ~., data=tmp_train,kernel='linear',gamma=0.2,cost=100)

#predictions
predA <- predict(svmModel, tmp_test)

这是confusionMatrix的正确调用:

confusionMatrix(predA, tmp_test$Score)