Question

由于为Fisher的虹膜数据创建了一个决策树，我得到错误分类错误率：0.02667 = 4/150。但我看到我的情节只有3个错误： DS for the iris

如果我们看一下这一点的可能性 - 没关系（维吉尼卡 - 与上图相同）：

   setosa versicolor  virginica
   0      0.1666667   0.83333333

你能解释为什么会发生这种错误分类（4个错误而不是3个错误，这些错误在情节中明确描述）？

代码：

# install.packages("tree")
# install.packages("ggplot2")

library('tree')
library('ggplot2') 

data(iris)

iris <- iris[ , c('Petal.Length', 'Petal.Width', 'Species')]
myTree <- tree(Species ~ Petal.Length + Petal.Width, data = iris)
summary(myTree)

# Classification tree:
# tree(formula = Species ~ Petal.Length + Petal.Width, data = iris)
# Number of terminal nodes:  5 
# Residual mean deviance:  0.157 = 22.77 / 145 
# Misclassification error rate: 0.02667 = 4 / 150 

# The errors were found by comparing predict(myTree, iris, type="class")
# with native data set 
errors <- data.frame(
Species = c('versicolor', 'versicolor', 'versicolor', 'virginica'),
Petal.Length = c(4.8, 5.0, 5.1, 4.5), Petal.Width = c(1.8, 1.7, 1.6, 1.7))

ggplot(iris, aes(x = Petal.Length, y = Petal.Width, colour = Species)) + 
 geom_point(size = 2.1) +
 geom_vline(xintercept = 2.45) +
 geom_hline(yintercept = 1.75) +
 geom_vline(xintercept = 4.95) + 
 geom_point(data = errors, shape = 1, size = 5,colour = "black")

Answer 1

您所关注的观点并未错误分类。

但是在那一点上有多个观察结果并且它们并不都具有相同的物种。在图中添加一些抖动......

ggplot(iris, aes(x = Petal.Length, y = Petal.Width, colour = Species)) +
   geom_point(position = "jitter") +
   geom_vline(xintercept = 4.95) + geom_vline(xintercept = 2.45) + geom_hline(yintercept = 1.75)

你会看到实际发生的事情。

从数据......

> iris[iris$Petal.Length == 4.8 & iris$Petal.Width == 1.8,]
    Petal.Length Petal.Width    Species
71           4.8         1.8 versicolor
127          4.8         1.8  virginica
139          4.8         1.8  virginica

为什么决策树在R中给出错误的分类？

1 个答案: