我用R插入符号训练了一个树模型。我现在正在尝试生成混淆矩阵并继续收到以下错误:
confusionMatrix.default(predictionsTree,testdata $ catgeory)出错 :数据和参考因子必须具有相同的级别数
prob <- 0.5 #Specify class split
singleSplit <- createDataPartition(modellingData2$category, p=prob,
times=1, list=FALSE)
cvControl <- trainControl(method="repeatedcv", number=10, repeats=5)
traindata <- modellingData2[singleSplit,]
testdata <- modellingData2[-singleSplit,]
treeFit <- train(traindata$category~., data=traindata,
trControl=cvControl, method="rpart", tuneLength=10)
predictionsTree <- predict(treeFit, testdata)
confusionMatrix(predictionsTree, testdata$catgeory)
生成混淆矩阵时发生错误。两个对象的级别相同。我无法弄清问题是什么。它们的结构和水平如下。 它们应该是一样的。任何帮助将非常感激,因为它让我破解!!
> str(predictionsTree)
Factor w/ 30 levels "16-Merchant Service Charge",..: 28 22 22 22 22 6 6 6 6 6 ...
> str(testdata$category)
Factor w/ 30 levels "16-Merchant Service Charge",..: 30 30 7 7 7 7 7 30 7 7 ...
> levels(predictionsTree)
[1] "16-Merchant Service Charge" "17-Unpaid Cheque Fee" "18-Gov. Stamp Duty" "Misc" "26-Standard Transfer Charge"
[6] "29-Bank Giro Credit" "3-Cheques Debit" "32-Standing Order - Debit" "33-Inter Branch Payment" "34-International"
[11] "35-Point of Sale" "39-Direct Debits Received" "4-Notified Bank Fees" "40-Cash Lodged" "42-International Receipts"
[16] "46-Direct Debits Paid" "56-Credit Card Receipts" "57-Inter Branch" "58-Unpaid Items" "59-Inter Company Transfers"
[21] "6-Notified Interest Credited" "61-Domestic" "64-Charge Refund" "66-Inter Company Transfers" "67-Suppliers"
[26] "68-Payroll" "69-Domestic" "73-Credit Card Payments" "82-CHAPS Fee" "Uncategorised"
> levels(testdata$category)
[1] "16-Merchant Service Charge" "17-Unpaid Cheque Fee" "18-Gov. Stamp Duty" "Misc" "26-Standard Transfer Charge"
[6] "29-Bank Giro Credit" "3-Cheques Debit" "32-Standing Order - Debit" "33-Inter Branch Payment" "34-International"
[11] "35-Point of Sale" "39-Direct Debits Received" "4-Notified Bank Fees" "40-Cash Lodged" "42-International Receipts"
[16] "46-Direct Debits Paid" "56-Credit Card Receipts" "57-Inter Branch" "58-Unpaid Items" "59-Inter Company Transfers"
[21] "6-Notified Interest Credited" "61-Domestic" "64-Charge Refund" "66-Inter Company Transfers" "67-Suppliers"
[26] "68-Payroll" "69-Domestic" "73-Credit Card Payments" "82-CHAPS Fee" "Uncategorised"
答案 0 :(得分:12)
尝试使用:
confusionMatrix(table(Argument 1, Argument 2))
这对我有用。
答案 1 :(得分:5)
也许你的模型没有预测某个因素。
使用table()
函数代替confusionMatrix()
来查看是否存在问题。
答案 2 :(得分:2)
尝试为na.pass
选项指定na.action
:
predictionsTree <- predict(treeFit, testdata,na.action = na.pass)
答案 3 :(得分:1)
将它们更改为数据框,然后在confusionMatrix函数中使用它们:
pridicted <- factor(predict(treeFit, testdata))
real <- factor(testdata$catgeory)
my_data1 <- data.frame(data = pridicted, type = "prediction")
my_data2 <- data.frame(data = real, type = "real")
my_data3 <- rbind(my_data1,my_data2)
# Check if the levels are identical
identical(levels(my_data3[my_data3$type == "prediction",1]) , levels(my_data3[my_data3$type == "real",1]))
confusionMatrix(my_data3[my_data3$type == "prediction",1], my_data3[my_data3$type == "real",1], dnn = c("Prediction", "Reference"))
答案 4 :(得分:0)
可能是testdata中缺少值,在&#34;之前添加以下行;预测树&lt; - predict(treeFit,testdata)&#34;删除NA。我有同样的错误,现在它适用于我。
testdata <- testdata[complete.cases(testdata),]
答案 5 :(得分:0)
您遇到的长度问题可能是由于训练集中存在NA - 要么丢弃不完整的案例,要么放弃以使您没有缺失值。
答案 6 :(得分:0)
我遇到了同样的问题,但是在阅读了这样的数据文件后继续改变了它。
data = na.omit(data)
非常感谢指针!
答案 7 :(得分:0)
确保安装了具有所有依赖项的软件包:
install.packages('caret', dependencies = TRUE)
confusionMatrix( table(prediction, true_value) )
答案 8 :(得分:0)
如果您的数据包含NA,则有时会将其视为一个因素级别,因此,请首先忽略这些NA。
function showToolTipWithId(id) {
activeToolTipId = id;
$toolTip.hide();
$("#tooltip" + id).show();
}
然后,如果您的模型拟合预测的水平不正确,那么最好使用表
$(".help-tip").on('click', function () {
var id = $(this).data('id');
showToolTipWithId(id);
});
$('.next').on('click', function () {
if (activeToolTipId < 3) {
var id = activeToolTipId + 1;
showToolTipWithId(id);
}
});
$('.back').on('click', function () {
if (activeToolTipId > 1) {
var id = activeToolTipId - 1;
showToolTipWithId(id);
}
});
答案 9 :(得分:0)
我遇到了同样的问题,我使用R有序因子数据类型解决了这个问题。
levels <- levels(predictionsTree)
levels <- levels[order(levels)]
table(ordered(predictionsTree,levels), ordered(testdata$catgeory, levels))