r中的决策树

时间:2016-02-08 13:50:55

标签: r tree machine-learning decision-tree

我的数据集是:

class Array
  def unique
    unique_arr = []
    each do |word|
      unique_arr.push(word) unless unique_arr.last == word
    end
    unique_arr
  end
end

我使用x=data.frame(v1=c(97 , 97 , 85 , 84 , 90 , 80 , 81 , 90 , 80, 70, 90 , 90, 90 ,95 , 88 , 99), + v2=c(99 , 91 , 91 ,83 , 99 , 95 , 74 , 88 , 82 , 80 , 96 , 87 , 92 , 96 , 88, 95), + v3=c( 89 ,93 , 87 , 80 , 96 , 96 , 75 , 90 , 78, 86 , 92 ,88 , 80, 88 , 98 ,98), + v4=c( 89 , 97 ,91 , 86 , 95 , 95 , 89 , 88 , 75, 82 , 99, 92 , 95, 92 , 90, 98), + v5=c( 99 ,90 , 93 ,91 , 90 , 90 , 77 , 92 , 85, 76 , 90, 96 , 90, 90 , 90, 92)) > x v1 v2 v3 v4 v5 1 97 99 89 89 99 2 97 91 93 97 90 3 85 91 87 91 93 4 84 83 80 86 91 5 90 99 96 95 90 6 80 95 96 95 90 7 81 74 75 89 77 8 90 88 90 88 92 9 80 82 78 75 85 10 70 80 86 82 76 11 90 96 92 99 90 12 90 87 88 92 96 13 90 92 80 95 90 14 95 96 88 92 90 15 88 88 98 90 90 16 99 95 98 98 92 包来应用决策树,如下所示:

rpart

情节树

# Classification Tree with rpart
library(rpart)
fit <- rpart(v5 ~ v1+v2+v3+v4,
              method="class", data=x)

printcp(fit) # display the results 

Classification tree:
rpart(formula = v5 ~ v1 + v2 + v3 + v4, data = x, method = "class")

Variables actually used in tree construction:
character(0)

Root node error: 9/16 = 0.5625

n= 16 

    CP nsplit rel error xerror xstd
1 0.01      0         1      0    0


> summary(fit) # detailed summary of splits

Call:
rpart(formula = v5 ~ v1 + v2 + v3 + v4, data = x, method = "class")
  n= 16 

    CP nsplit rel error xerror xstd
1 0.01      0         1      0    0

Node number 1: 16 observations
  predicted class=90  expected loss=0.5625  P(node) =1
    class counts:     1     1     1     7     1     2     1     1     1
   probabilities: 0.062 0.062 0.062 0.438 0.062 0.125 0.062 0.062 0.062 

我申请 # plot tree plot(fit, uniform=TRUE, + main="Classification Tree ") Error in plot.rpart(fit, uniform = TRUE, main = "Classification Tree ") : fit is not a tree, just a root text(fit, use.n=TRUE, all=TRUE, cex=.8) Error in text.rpart(fit, use.n = TRUE, all = TRUE, cex = 0.8) : fit is not a tree, just a root 时错了什么?为什么它给我一个错误的树情节?如何修复此错误错误:

fit不是树,只是根

3 个答案:

答案 0 :(得分:2)

如果要构建分类树,则使用method =“class”;如果要构建回归树,则使用method =“anova”。看起来你有一个连续的响应,所以你应该建立一个回归树(即method =“anova”)。

答案 1 :(得分:0)

您正在使用RPART的默认控制参数。使用您的数据集,RPART无法遵守默认值并创建树(分支拆分)

rpart.control(minsplit = 20, minbucket = round(minsplit/3), cp = 0.01, 
              maxcompete = 4, maxsurrogate = 5, usesurrogate = 2, xval = 10,
              surrogatestyle = 0, maxdepth = 30, ...)

根据数据集调整控制参数。

例如:

t <- rpart(formula = v5 ~ v1 + v2 + v3 + v4, data = x, method = anova",control =rpart.control(minsplit = 1,minbucket=1, cp=0))

但请注意,这可能会创建一个过度拟合的决策树。

答案 2 :(得分:0)

我使用您的x数据框运行了以下代码,并得到了一棵树,如下所示:

library(rpart)
library(rattle)
library(rpart.plot)
library(RColorBrewer)

fit <- rpart(v5 ~ v1+v2+v3+v4,
             method="anova", 
             data=x,
             control = rpart.control(minsplit = 6, cp = 0.01))
fancyRpartPlot(fit)  #from RColorBrewer package

enter image description here

请注意,您的方法应为 anova ,因为 v5 是一个连续变量,您必须覆盖控制参数control = rpart.control(...)以调整深度树。