Question

我训练了一个随机森林：

model <- randomForest(x, y, proximity=TRUE)

当我想为新对象预测y时，我使用

y_pred <- predict(model, xnew)

如何根据已有的森林（模型）计算新对象（xnew）和训练集（x）之间的接近程度？预测函数中的邻近选项仅给出新对象（xnew）中的邻近关系。我可以在组合数据集（x和xnew）上再次运行randomForest无监督以获得邻接，但我认为必须有一些方法可以避免再次构建林，而是使用现有的林。

谢谢！ Kilian的

Answer 1

我相信你想要的是在randomForest调用本身中指定你的测试观察结果，如下所示：

set.seed(71)
ind <- sample(1:150,140,replace = FALSE)
train <- iris[ind,]
test <- iris[-ind,]

iris.rf1 <- randomForest(x = train[,1:4],
                         y = train[,5],
                         xtest = test[,1:4],
                         ytest = test[,5], 
                         importance=TRUE,
                         proximity=TRUE)

dim(iris.rf1$test$prox)
[1]  10 150

这样，您可以从十个测试用例到150个测试用例。

我认为，唯一的另一种选择是在您的新案例predict上调用rbind。但是，通过randomForest调用，您无需提前测试用例。

在这种情况下，您需要在keep.forest = TRUE来电中使用randomForest，当然在致电proximity = TRUE时设置predict。

R RandomForest：新物体的接近度

1 个答案: