我有一些代码,我适合树,然后通过选择复杂性参数自动修剪树,以便最小化交叉验证错误,如printcp()
函数所示。在消化我的控制台输出时,我对printcp()
打印出的质量感到恼火。
我所做的是将printcp()函数的输出转换为数据帧,然后使用一些逻辑来提取最低CV错误的CP值。无论如何我可以这样做,没有将printcp的输出打印到控制台?
df_tree_1 <- rpart(formula(df_lm_2), cp = 0.0001, data = train)
cp_df <- data.frame(printcp(df_tree_1))
df_tree_1 <- prune.rpart(tree = df_tree_1, cp = cp_df$CP[which(cp_df$xerror == min(cp_df$xerror))])
答案 0 :(得分:1)
您的rpart()
- 拟合树对象包含&#34; cptable
&#34;包含您要查找的值的表格。 printcp()
函数只显示此表,因此您真正想要做的只是在运行prune()
时动态返回值。以下是您如何做到这一点:
library(rpart) # for the rpart function
library(rattle) # for "weather" dataset and for "fancy" tree plotter
# fit model using rpart
fit <- rpart(RainTomorrow ~ Rainfall + Evaporation + Sunshine + WindGustDir,
data = weather,
method = "class")
# visualize with rattle
fancyRpartPlot(fit)
# prune by returning the value in the column of fit$cptable (a table)
# corresponding to the row that has the minimum "xerror" value
fit_autoprune <- prune(tree = fit,
cp = fit$cptable[which.min(fit$cptable[, "xerror"]),
"CP"])
# visualize again to see difference
fancyRpartPlot(fit_autoprune)