我一直在尝试使用R包'randomForestSRC'来预测一些东西,但是在运行'rfsrc'和'predict.rfsrc'之后。两者都有一个称为预测的回报,但预测值似乎与我的任何值都不相关。有谁知道这些预测值是什么?
我运行的命令:(这是他们在文档中的示例)
data(veteran, package = "randomForestSRC")
train <- sample(1:nrow(veteran), round(nrow(veteran) * 0.80))
veteran.grow <- rfsrc(Surv(time, status) ~ ., veteran[train, ], ntree = 100)
veteran.pred <- predict(veteran.grow, veteran[-train , ])
预测值:
veteran.pred$predicted
[1] 49.96350 58.45100 38.28317 63.17000 67.56917 57.45633 66.23733 54.81967 72.60817 47.71083 43.94983 37.85000
[13] 41.80333 47.84233 85.81488 70.49050 92.45600 70.95321 85.63933 45.38833 66.74655 76.46067 52.68717 68.90750
[25] 85.17983 43.31617 48.80267
答案 0 :(得分:0)
预测值是死亡率的度量。有关终端节点估算器和集合统计信息的更多信息,请参阅我们的GitHub页面,特别是理论和规范部分:
答案 1 :(得分:0)
rfsrc
和predict.rfsrc
的预测值都是基于所有构建的树的预测,分别使用训练数据和测试数据。
#In-bag predicted value for the first case in training data
veteran.grow$predicted[1]
> 80.56843
#Prediction based on all trees for the same case
predict(veteran.grow,
newdata=veteran[train[1],])$predicted
> 80.56843
rfsrc
还返回out-of-bag prediction作为predicted.oob
。这是基于在构建过程中未使用该案例的树木。例如,如果案例1用于树1至30,则案例1的OOB预测将基于树31至100,而不是所有树。
#Keeping the info about nodes of each tree
veteran.grow <- rfsrc(Surv(time, status) ~ ., veteran[train, ], ntree = 100,
membership=T)
#Out-of-bag predicted value for the first case
veteran.grow$predicted.oob[1]
> 72.88305
#Prediction based on the trees that case 1 was not included in
ind = which(veteran.grow$inbag[1,]==0)
predict(veteran.grow,
newdata=veteran[train[1],],
get.tree=ind)$predicted
> 72.88305