处理没有因变量的测试数据(应用混淆矩阵)

时间:2017-06-20 12:48:21

标签: r r-caret confusion-matrix

我有这些数据:

training.csv:

A   B   D
22  1   1
32  2   1
34  3   0
44  4   1

testing.csv:

A   B
12  1
33  2
21  3

请注意,测试数据不包含因变量。

我正在尝试应用不同的模型,然后使用混淆矩阵。

但是,因为测试数据不包含因变量(" D"),我收到的消息是:Error in confusionMatrix.default(X[[i]], ...) : the data cannot have more levels than the reference

我如何克服这种情况?

如果我尝试为测试数据创建一个空列(填充NA),然后应用混淆矩阵,我再次收到相同的错误。

library(caret)

tr_data <- read.csv("./training.csv", header = TRUE)
t_data <- read.csv("./testing.csv", header = TRUE)

tr_data$D <- as.factor(ifelse(tr_data$D == 1, "yes", "no"))

trctrl <- trainControl(method = "cv", classProbs = TRUE, summaryFunction = mnLogLoss)
rpart_m <- train(D~., data = tr_data, method = "rpart", trControl = trctrl, metric = "logLoss")
rf_m <- train(D~., data = tr_data, method = "glm", trControl = trctrl, metric = "logLoss")

all_models <- list(rf = rf_m, rpart = rpart_m)
pred <- predict(all_models, newdata = t_data)

# Generate confusion matrix for each model
lapply(pred, FUN = confusionMatrix, reference = t_data$D, positive = "yes")

0 个答案:

没有答案