我有这些数据:
training.csv:
A B D
22 1 1
32 2 1
34 3 0
44 4 1
testing.csv:
A B
12 1
33 2
21 3
请注意,测试数据不包含因变量。
我正在尝试应用不同的模型,然后使用混淆矩阵。
但是,因为测试数据不包含因变量(" D"),我收到的消息是:Error in confusionMatrix.default(X[[i]], ...) :
the data cannot have more levels than the reference
。
我如何克服这种情况?
如果我尝试为测试数据创建一个空列(填充NA),然后应用混淆矩阵,我再次收到相同的错误。
library(caret)
tr_data <- read.csv("./training.csv", header = TRUE)
t_data <- read.csv("./testing.csv", header = TRUE)
tr_data$D <- as.factor(ifelse(tr_data$D == 1, "yes", "no"))
trctrl <- trainControl(method = "cv", classProbs = TRUE, summaryFunction = mnLogLoss)
rpart_m <- train(D~., data = tr_data, method = "rpart", trControl = trctrl, metric = "logLoss")
rf_m <- train(D~., data = tr_data, method = "glm", trControl = trctrl, metric = "logLoss")
all_models <- list(rf = rf_m, rpart = rpart_m)
pred <- predict(all_models, newdata = t_data)
# Generate confusion matrix for each model
lapply(pred, FUN = confusionMatrix, reference = t_data$D, positive = "yes")