我正在运行决策树来对葡萄酒的质量进行分类。当我运行predict()
时,它输出的是数字而不是所需的因子。我正在使用以下代码:
library(rpart)
wine <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv", sep = ";")
wine$taste <- ifelse(wine$quality < 5, "yucky", "tasty")
wine$taste[wine$quality == 5] <- "fine"
wine$taste <- as.factor(wine$taste)
set.seed(123)
sample <- sample(nrow(wine), 0.7 * nrow(wine))
train <- wine[sample, ]
test <- wine[-sample, ]
DecisionTree <- rpart(taste ~ ., data = train)
pred <- predict(DecisionTree, test)
我得到的东西看起来像这样:
head(pred, 10)
fine tastey yucky
6 1 0 0
14 1 0 0
18 1 0 0
23 1 0 0
24 1 0 0
25 0 1 0
26 1 0 0
30 0 1 0
33 1 0 0
35 1 0 0
我正在尝试使用以下公式计算准确性模型:
acc <- table(pred, test$taste)
sum(diag(acc)) / sum(acc)
我得到的错误是参数长度不同,这是由于pred输出格式的原因。我认为输出应该看起来像:
6 14 18 23 24 25 26 30 33
fine fine fine fine fine fine fine tasty fine
Levels: fine tasty yucky
我不确定自己在做什么错。我怀疑这与rpart()
函数有关。当我执行相同的过程但使用随机森林时,它可以完美运行。任何帮助将不胜感激。
答案 0 :(得分:0)
您需要在type = "class"
中指定predict
。来自?predict.rpart
:
如果type =“ class”:
(对于分类树)是 根据响应进行分类。
您可以确认它是否提供了所需的输出:
pred <- predict(DecisionTree, test, type = "class")
head(pred, 10)
# 3 7 12 14 15 21 22 23 27 30
# fine fine fine fine fine tasty fine fine fine tasty
#Levels: fine tasty yucky