Classification Tree in R multiple times

时间:2017-05-16 09:23:10

标签: r tree statistics classification

I have a problem when running a classification tree in R using the function tree() and the following piece of code:

library(tree)

library(ISLR)

attach(Carseats)

High=ifelse(Sales<=8, "No", "Yes") 

Carseats=data.frame(Carseats, High)

tree.carseats=tree(High~.-Sales, Carseats)


summary(tree.carseats)

The problem is that when I run all the code together for the first time, I get the same results as the book I am referring to (Introduction to Statistical Learning):

Classification tree:
tree(formula = High ~ . - Sales, data = Carseats)
Variables actually used in tree construction:
[1] "ShelveLoc"   "Price"       "Income"      "CompPrice"   "Population"  "Advertising" "Age"         "US"         
Number of terminal nodes:  27 
Residual mean deviance:  0.4575 = 170.7 / 373 
Misclassification error rate: 0.09 = 36 / 400 

However, when I run the same code again the tree is not providing any more meaningful results:

Classification tree:
tree(formula = High ~ . - Sales, data = Carseats)
Variables actually used in tree construction:
[1] "High.1"
Number of terminal nodes:  2 
Residual mean deviance:  0 = 0 / 398 
Misclassification error rate: 0 = 0 / 400 

Can someone explain me what is going on?

Thanks.

1 个答案:

答案 0 :(得分:1)

时间久了,但我仍然希望我的回答能对您和其他遇到相同问题的人有所帮助。

当您将新的data.frame分配给与整个数据集相同的名称时,我认为问题出在变量名“ Carseats”。我确实将名称更改为“ Car”(例如),并且有效:

 library(tree)
 library(ISLR)
 attach(Carseats)
 High = ifelse(Sales <= 8, "No", "Yes")
 Car = data.frame(Carseats, High)
 tree.carseats = tree(High~.-Sales, Car)
 summary(tree.carseats)

或者您可以使用以下另一种方式:

library(tree)
library(ISLR)
attach(Carseats)
High = ifelse(Sales <= 8, "No", "Yes")
New = cbind(Carseats, High)
tree.carseats = tree(High~.-Sales, New)
summary(tree.carseats)

我使用cbind()将Carseats数据集和High合并到一个名为“ New”的新数据集中。

也许这个问题(如果您的做法与本书相同)是由于Rstudio版本的差异(本书(ISLR)没有提到)而引起的。

希望这会有所帮助! :)