Question

I have a problem when running a classification tree in R using the function tree() and the following piece of code:

library(tree)

library(ISLR)

attach(Carseats)

High=ifelse(Sales<=8, "No", "Yes") 

Carseats=data.frame(Carseats, High)

tree.carseats=tree(High~.-Sales, Carseats)


summary(tree.carseats)

The problem is that when I run all the code together for the first time, I get the same results as the book I am referring to (Introduction to Statistical Learning):

Classification tree:
tree(formula = High ~ . - Sales, data = Carseats)
Variables actually used in tree construction:
[1] "ShelveLoc"   "Price"       "Income"      "CompPrice"   "Population"  "Advertising" "Age"         "US"         
Number of terminal nodes:  27 
Residual mean deviance:  0.4575 = 170.7 / 373 
Misclassification error rate: 0.09 = 36 / 400

However, when I run the same code again the tree is not providing any more meaningful results:

Classification tree:
tree(formula = High ~ . - Sales, data = Carseats)
Variables actually used in tree construction:
[1] "High.1"
Number of terminal nodes:  2 
Residual mean deviance:  0 = 0 / 398 
Misclassification error rate: 0 = 0 / 400

Can someone explain me what is going on?

Thanks.

Answer 1

时间久了，但我仍然希望我的回答能对您和其他遇到相同问题的人有所帮助。

当您将新的data.frame分配给与整个数据集相同的名称时，我认为问题出在变量名“ Carseats”。我确实将名称更改为“ Car”（例如），并且有效：

 library(tree)
 library(ISLR)
 attach(Carseats)
 High = ifelse(Sales <= 8, "No", "Yes")
 Car = data.frame(Carseats, High)
 tree.carseats = tree(High~.-Sales, Car)
 summary(tree.carseats)

或者您可以使用以下另一种方式：

library(tree)
library(ISLR)
attach(Carseats)
High = ifelse(Sales <= 8, "No", "Yes")
New = cbind(Carseats, High)
tree.carseats = tree(High~.-Sales, New)
summary(tree.carseats)

我使用cbind（）将Carseats数据集和High合并到一个名为“ New”的新数据集中。

也许这个问题（如果您的做法与本书相同）是由于Rstudio版本的差异（本书（ISLR）没有提到）而引起的。

希望这会有所帮助！：）

Classification Tree in R multiple times

1 个答案: