I have a problem when running a classification tree in R using the function tree() and the following piece of code:
library(tree)
library(ISLR)
attach(Carseats)
High=ifelse(Sales<=8, "No", "Yes")
Carseats=data.frame(Carseats, High)
tree.carseats=tree(High~.-Sales, Carseats)
summary(tree.carseats)
The problem is that when I run all the code together for the first time, I get the same results as the book I am referring to (Introduction to Statistical Learning):
Classification tree:
tree(formula = High ~ . - Sales, data = Carseats)
Variables actually used in tree construction:
[1] "ShelveLoc" "Price" "Income" "CompPrice" "Population" "Advertising" "Age" "US"
Number of terminal nodes: 27
Residual mean deviance: 0.4575 = 170.7 / 373
Misclassification error rate: 0.09 = 36 / 400
However, when I run the same code again the tree is not providing any more meaningful results:
Classification tree:
tree(formula = High ~ . - Sales, data = Carseats)
Variables actually used in tree construction:
[1] "High.1"
Number of terminal nodes: 2
Residual mean deviance: 0 = 0 / 398
Misclassification error rate: 0 = 0 / 400
Can someone explain me what is going on?
Thanks.
答案 0 :(得分:1)
时间久了,但我仍然希望我的回答能对您和其他遇到相同问题的人有所帮助。
当您将新的data.frame分配给与整个数据集相同的名称时,我认为问题出在变量名“ Carseats”。我确实将名称更改为“ Car”(例如),并且有效:
library(tree)
library(ISLR)
attach(Carseats)
High = ifelse(Sales <= 8, "No", "Yes")
Car = data.frame(Carseats, High)
tree.carseats = tree(High~.-Sales, Car)
summary(tree.carseats)
或者您可以使用以下另一种方式:
library(tree)
library(ISLR)
attach(Carseats)
High = ifelse(Sales <= 8, "No", "Yes")
New = cbind(Carseats, High)
tree.carseats = tree(High~.-Sales, New)
summary(tree.carseats)
我使用cbind()将Carseats数据集和High合并到一个名为“ New”的新数据集中。
也许这个问题(如果您的做法与本书相同)是由于Rstudio版本的差异(本书(ISLR)没有提到)而引起的。
希望这会有所帮助! :)