Question

我使用rpart创建了一个决策树，我想知道如何确切地找到训练数据的哪些情况落入每个终端节点。

我按照以下链接回答： How to count the observations falling in each node of a tree 但由于某种原因，$ where函数仅生成终端节点的向量，而行号不表示哪个情况对应于哪个终端节点。但是，如果我对使用树包制作的树执行完全相同的操作，我将获得具有相应终端节点的行号列表（标识每种情况）。我注意到唯一的区别是对于rpart包，$ where产生了一个＆＃34; int＆＃34;矢量而对于树包，$ where产生一个＆＃34;命名为int＆＃34;向量。我想知道如何生产相同的＆＃34;命名为int＆＃34;由rpart制成的树的矢量？

我也尝试过以下建议： Find the data elements in a data frame that pass the rule for a node in a tree model? 但它对我不起作用，因为rpart在创建模型时删除了16个观察值，因此结果模型中的观察次数与用于创建模型的原始数据帧不匹配。

对不起，如果答案显而易见，新手R用户在这里！

以下是我用来创建树的代码，它使用的树根据行为特征预测自闭症的诊断：

Set.seed(565808016)
inTrain21<- createDataPartition(clinicaldiagnosis, p=0.75, list=FALSE)
training_data21<- Decisiontree4[ inTrain21,]
testing_data21<- Decisiontree4[-inTrain21,]
test_clinicaldiagnosis21<-clinicaldiagnosis[-inTrain21]
lossmatrix=matrix(c(0,1,1,1,0,1,2,1,0), ncol=3, nrow=3)

set.seed(591251974)
tree_model22= rpart(clinicaldiagnosis~ Visualtracking + etc etc, training_data21, na.action=na.rpart, method="class"， control=rpart.control(cp=0.00001), parms=list(loss=lossmatrix))
plot(tree_model22, uniform=TRUE, margin=0.05)
text(tree_model22, use.n=TRUE, pretty=0)
plotcp(tree_model22)
printcp(tree_model22)

pruned_model22=prune(tree_model22, cp=0.0146341)
plot(pruned_model22, uniform=TRUE, margin=0.1)
text(pruned_model22, use.n=TRUE, cex=0.85, splits=TRUE, pretty=0)

tree_pred22=predict(pruned_model22, testing_data21, type="class")
table(tree_pred22, test_clinicaldiagnosis21)
trainingnodes22<-rownames(pruned_model22$frame)[pruned_model22$where] #this only gives a list of terminal nodes without the corresponding row names

rpart在每个节点中找到观察结果

0 个答案: