我正在做一个关于如何使用rpart创建回归树的简单测试,我用手工制作的数据发现了R的一个令人惊讶的行为:
- 当使用maxdepth = 1生长树时 - >没有拆分!
- 当使用maxdepth = 2生长树时 - >完成2次分裂!
为什么没有使用maxdepth = 1进行拆分?我猜rpart函数的某些参数是"阻止增长",但是哪一个?
以下是使用的数据:
enter code here
# generate some data
set.seed(1234)
x <- runif(200, min=0, max=1)
y <- runif(200, min=0, max=1)
mydf <- cbind.data.frame(x, y)
mydf <- mydf%>%mutate(target = ifelse(
((x>0.2)&(x<0.5) | (x>0.7)&(x<0.9)) & (y>0.1)&(y<0.8), 1, 0))
# to look at data that was generated
plot(mydf$x, mydf$y,
main = "Observations (red triangles stand for Target = 1)", #title
col = mydf$target + 1, #colours defined by an integer (in that case 1 or 2)
pch = 16 + mydf$target)
abline(v = c(0.2, 0.5, 0.7, 0.9), lty = 2, col = "grey")
abline(h = c(0.1, 0.8), lty = 2, col = "grey")
# grow a tree
mydf$target_factor <- as.factor(ifelse(mydf$target == 1, "success", "failure"))
predictors <- c("x", "y")
predictors <- paste(predictors,collapse = "+")
formula <- paste("target_factor",predictors,sep="~")
formula <- as.formula(formula)
myregressiontree <- rpart(formula, data = mydf, control = rpart.control(maxdepth = 1))
print(myregressiontree)