我使用rpart来获取我的数据的分类模型,但我不知道如何分配桶大小以避免过度装配或不合适的模型。为了获得最佳的桶大小,我读到使用插入符号的包训练方法提供了一种获得最佳桶的方法,因此在R中实现了几行:
tree <- rpart(y ~ x1 + x2 + x3 + x4 + x5 + x6, method = 'class', data = train, minbucket = 15) - (I have anonymized the formula of my model)
numfolds <- trainControl(method = "cv", number = 10)
cpGrid <- expand.grid(.cp = seq(0.0001, 0.005, 0.0001))
train(y ~ x1 + x2 + x3 + x4 + x5 + x6, data = train, method = "rpart", trControl = numfolds, tuneGrid = cpGrid)
打印输出给出:
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was cp = 0.0024.
好的,所以我注意并在我的rpart模型中使用了cp = 0.0024
treeCV <- rpart(y ~ x1 + x2 + x3 + x4 + x5 + x6, method = 'class', data = train, cp = 0.0024)
prp(treeCV)
我只有#34; prp&#34;可视化。
有任何帮助吗?如果需要更多信息,请告诉我。