我的数据集是:
class Array
def unique
unique_arr = []
each do |word|
unique_arr.push(word) unless unique_arr.last == word
end
unique_arr
end
end
我使用x=data.frame(v1=c(97 , 97 , 85 , 84 , 90 , 80 , 81 , 90 , 80, 70, 90 , 90, 90 ,95 , 88 , 99),
+ v2=c(99 , 91 , 91 ,83 , 99 , 95 , 74 , 88 , 82 , 80 , 96 , 87 , 92 , 96 , 88, 95),
+ v3=c( 89 ,93 , 87 , 80 , 96 , 96 , 75 , 90 , 78, 86 , 92 ,88 , 80, 88 , 98 ,98),
+ v4=c( 89 , 97 ,91 , 86 , 95 , 95 , 89 , 88 , 75, 82 , 99, 92 , 95, 92 , 90, 98),
+ v5=c( 99 ,90 , 93 ,91 , 90 , 90 , 77 , 92 , 85, 76 , 90, 96 , 90, 90 , 90, 92))
> x
v1 v2 v3 v4 v5
1 97 99 89 89 99
2 97 91 93 97 90
3 85 91 87 91 93
4 84 83 80 86 91
5 90 99 96 95 90
6 80 95 96 95 90
7 81 74 75 89 77
8 90 88 90 88 92
9 80 82 78 75 85
10 70 80 86 82 76
11 90 96 92 99 90
12 90 87 88 92 96
13 90 92 80 95 90
14 95 96 88 92 90
15 88 88 98 90 90
16 99 95 98 98 92
包来应用决策树,如下所示:
rpart
情节树
# Classification Tree with rpart
library(rpart)
fit <- rpart(v5 ~ v1+v2+v3+v4,
method="class", data=x)
printcp(fit) # display the results
Classification tree:
rpart(formula = v5 ~ v1 + v2 + v3 + v4, data = x, method = "class")
Variables actually used in tree construction:
character(0)
Root node error: 9/16 = 0.5625
n= 16
CP nsplit rel error xerror xstd
1 0.01 0 1 0 0
> summary(fit) # detailed summary of splits
Call:
rpart(formula = v5 ~ v1 + v2 + v3 + v4, data = x, method = "class")
n= 16
CP nsplit rel error xerror xstd
1 0.01 0 1 0 0
Node number 1: 16 observations
predicted class=90 expected loss=0.5625 P(node) =1
class counts: 1 1 1 7 1 2 1 1 1
probabilities: 0.062 0.062 0.062 0.438 0.062 0.125 0.062 0.062 0.062
我申请 # plot tree
plot(fit, uniform=TRUE,
+ main="Classification Tree ")
Error in plot.rpart(fit, uniform = TRUE, main = "Classification Tree ") :
fit is not a tree, just a root
text(fit, use.n=TRUE, all=TRUE, cex=.8)
Error in text.rpart(fit, use.n = TRUE, all = TRUE, cex = 0.8) :
fit is not a tree, just a root
时错了什么?为什么它给我一个错误的树情节?如何修复此错误错误:
fit不是树,只是根
答案 0 :(得分:2)
如果要构建分类树,则使用method =“class”;如果要构建回归树,则使用method =“anova”。看起来你有一个连续的响应,所以你应该建立一个回归树(即method =“anova”)。
答案 1 :(得分:0)
您正在使用RPART的默认控制参数。使用您的数据集,RPART无法遵守默认值并创建树(分支拆分)
rpart.control(minsplit = 20, minbucket = round(minsplit/3), cp = 0.01,
maxcompete = 4, maxsurrogate = 5, usesurrogate = 2, xval = 10,
surrogatestyle = 0, maxdepth = 30, ...)
根据数据集调整控制参数。
例如:
t <- rpart(formula = v5 ~ v1 + v2 + v3 + v4, data = x, method = anova",control =rpart.control(minsplit = 1,minbucket=1, cp=0))
但请注意,这可能会创建一个过度拟合的决策树。
答案 2 :(得分:0)
我使用您的x
数据框运行了以下代码,并得到了一棵树,如下所示:
library(rpart)
library(rattle)
library(rpart.plot)
library(RColorBrewer)
fit <- rpart(v5 ~ v1+v2+v3+v4,
method="anova",
data=x,
control = rpart.control(minsplit = 6, cp = 0.01))
fancyRpartPlot(fit) #from RColorBrewer package
请注意,您的方法应为 anova ,因为 v5 是一个连续变量,您必须覆盖控制参数control = rpart.control(...)
以调整深度树。