Question

我用R插入符号训练了一个树模型。我现在正在尝试生成混淆矩阵并继续收到以下错误：

confusionMatrix.default（predictionsTree，testdata $ catgeory）出错：数据和参考因子必须具有相同的级别数

prob <- 0.5 #Specify class split
singleSplit <- createDataPartition(modellingData2$category, p=prob,
                                   times=1, list=FALSE)
cvControl <- trainControl(method="repeatedcv", number=10, repeats=5)
traindata <- modellingData2[singleSplit,]
testdata <- modellingData2[-singleSplit,]
treeFit <- train(traindata$category~., data=traindata,
                 trControl=cvControl, method="rpart", tuneLength=10)
predictionsTree <- predict(treeFit, testdata)
confusionMatrix(predictionsTree, testdata$catgeory)

生成混淆矩阵时发生错误。两个对象的级别相同。我无法弄清问题是什么。它们的结构和水平如下。它们应该是一样的。任何帮助将非常感激，因为它让我破解!!

> str(predictionsTree)
 Factor w/ 30 levels "16-Merchant Service Charge",..: 28 22 22 22 22 6 6 6 6 6 ...
> str(testdata$category)
 Factor w/ 30 levels "16-Merchant Service Charge",..: 30 30 7 7 7 7 7 30 7 7 ...

> levels(predictionsTree)
 [1] "16-Merchant Service Charge"   "17-Unpaid Cheque Fee"         "18-Gov. Stamp Duty"           "Misc"                         "26-Standard Transfer Charge" 
 [6] "29-Bank Giro Credit"          "3-Cheques Debit"              "32-Standing Order - Debit"    "33-Inter Branch Payment"      "34-International"            
[11] "35-Point of Sale"             "39-Direct Debits Received"    "4-Notified Bank Fees"         "40-Cash Lodged"               "42-International Receipts"   
[16] "46-Direct Debits Paid"        "56-Credit Card Receipts"      "57-Inter Branch"              "58-Unpaid Items"              "59-Inter Company Transfers"  
[21] "6-Notified Interest Credited" "61-Domestic"                  "64-Charge Refund"             "66-Inter Company Transfers"   "67-Suppliers"                
[26] "68-Payroll"                   "69-Domestic"                  "73-Credit Card Payments"      "82-CHAPS Fee"                 "Uncategorised"   

> levels(testdata$category)
 [1] "16-Merchant Service Charge"   "17-Unpaid Cheque Fee"         "18-Gov. Stamp Duty"           "Misc"                         "26-Standard Transfer Charge" 
 [6] "29-Bank Giro Credit"          "3-Cheques Debit"              "32-Standing Order - Debit"    "33-Inter Branch Payment"      "34-International"            
[11] "35-Point of Sale"             "39-Direct Debits Received"    "4-Notified Bank Fees"         "40-Cash Lodged"               "42-International Receipts"   
[16] "46-Direct Debits Paid"        "56-Credit Card Receipts"      "57-Inter Branch"              "58-Unpaid Items"              "59-Inter Company Transfers"  
[21] "6-Notified Interest Credited" "61-Domestic"                  "64-Charge Refund"             "66-Inter Company Transfers"   "67-Suppliers"                
[26] "68-Payroll"                   "69-Domestic"                  "73-Credit Card Payments"      "82-CHAPS Fee"                 "Uncategorised"

Answer 1

尝试使用：

confusionMatrix(table(Argument 1, Argument 2))

这对我有用。

Answer 2

也许你的模型没有预测某个因素。使用table()函数代替confusionMatrix()来查看是否存在问题。

Answer 3

尝试为na.pass选项指定na.action：

predictionsTree <- predict(treeFit, testdata,na.action = na.pass)

Answer 4

将它们更改为数据框，然后在confusionMatrix函数中使用它们：

pridicted <- factor(predict(treeFit, testdata))
real <- factor(testdata$catgeory)

my_data1 <- data.frame(data = pridicted, type = "prediction")
my_data2 <- data.frame(data = real, type = "real")
my_data3 <- rbind(my_data1,my_data2)

# Check if the levels are identical
identical(levels(my_data3[my_data3$type == "prediction",1]) , levels(my_data3[my_data3$type == "real",1]))

confusionMatrix(my_data3[my_data3$type == "prediction",1], my_data3[my_data3$type == "real",1],  dnn = c("Prediction", "Reference"))

Answer 5

可能是testdata中缺少值，在＆＃34;之前添加以下行;预测树＆lt; - predict（treeFit，testdata）＆＃34;删除NA。我有同样的错误，现在它适用于我。

testdata <- testdata[complete.cases(testdata),]

Answer 6

您遇到的长度问题可能是由于训练集中存在NA - 要么丢弃不完整的案例，要么放弃以使您没有缺失值。

Answer 7

我遇到了同样的问题，但是在阅读了这样的数据文件后继续改变了它。

data = na.omit(data)

非常感谢指针！

Answer 8

确保安装了具有所有依赖项的软件包：

install.packages('caret', dependencies = TRUE)

confusionMatrix( table(prediction, true_value) )

Answer 9

如果您的数据包含NA，则有时会将其视为一个因素级别，因此，请首先忽略这些NA。

function showToolTipWithId(id) {
   activeToolTipId = id;
   $toolTip.hide(); 
   $("#tooltip" + id).show();
}

然后，如果您的模型拟合预测的水平不正确，那么最好使用表

$(".help-tip").on('click', function () {
   var id = $(this).data('id');
   showToolTipWithId(id);
});

$('.next').on('click', function () { 
   if (activeToolTipId < 3) {
   var id = activeToolTipId + 1;
   showToolTipWithId(id);
   }
});

$('.back').on('click', function () {
   if (activeToolTipId > 1) {
   var id = activeToolTipId - 1;
   showToolTipWithId(id);
 }
});

Answer 10

我遇到了同样的问题，我使用R有序因子数据类型解决了这个问题。

levels <- levels(predictionsTree)
levels <- levels[order(levels)]    
table(ordered(predictionsTree,levels), ordered(testdata$catgeory, levels))

ConfusionMatrix中的错误数据和参考因子必须具有相同的级别数

10 个答案: