混淆矩阵错误:错误:“数据”和“参考”应该是具有相同水平的因子

时间:2020-01-18 13:31:52

标签: r machine-learning classification confusion-matrix rpart

编辑后的问题:

我有一个699行的数据集,在我正在研究的练习中,要求它生成300个观察值的训练集。剩下的就是测试集。我写了所有可能的信息,以便使情况更加清楚。

#First part of the code & Preprocessing
attach(Cancer_data)
names(Cancer_data)[1] <- "id"
names(Cancer_data)[2] <- "thickness"
names(Cancer_data)[3] <- "unif.size"
names(Cancer_data)[4] <- "unif.shape"
names(Cancer_data)[5] <- "adhesion"
names(Cancer_data)[6] <- "size"
names(Cancer_data)[7] <- "nuclei"
names(Cancer_data)[8] <- "chromatin"
names(Cancer_data)[9] <- "nucleoli"
names(Cancer_data)[10] <- "mitoses" 
names(Cancer_data)[11] <- "Prognosis"   
#Prognosis are my class labels 2 for benign cancer 4 for malignant
Prognosis <- as.factor(Cancer_data$Prognosis)
Cancer_data <- Cancer_data %>% dplyr :: select(-id)

直接传递给rpart模型,避免重新编写足够清晰的数据拆分,我用r part实现了这个分类树模型

rpart_model <- rpart(Prognosis ~.,method = "class",data = train_set)
#The train_set was implemented before with caret:: createDataPrtition()

现在这是主要问题,因为当我预测test_set上的树性能并尝试获取confusionMatrix R时,会返回此错误:

Error: `data` and `reference` should be factors with the same levels.

此处是已实现的代码

y_hat <- predict(rpart_model,test_set)
confusionMatrix(Cancer_data$Prognosis,y_hat)

我也尝试过

y_hat <- predict(rpart_model,type ='class')

如先前的Post

所建议

对于这个问题的长度,我深表歉意,但我希望尽可能地精确。 预先谢谢你

0 个答案:

没有答案