Question

我很抱歉再次发布这个问题，但我现在真的需要帮助。我正在尝试计算R中randomForest模型的训练集的AUC，并且有两种计算方法，但给出不同的结果。以下是我的问题的可重现的例子。如果有人可以提供帮助，我真的很感激！

library(randomForest)
library(pROC)
library(ROCR)
# prep training to binary outcome
train <- iris[iris$Species %in% c('virginica', 'versicolor'),]
train$Species <- droplevels(train$Species)

# build model
rfmodel <- randomForest(Species~., data=train, importance=TRUE, ntree=2)

#the first way to calculate training auc
rf_p_train <- predict(rfmodel, type="prob",newdata = train)[,2]
rf_pr_train <- prediction(rf_p_train, train$Species)
r_auc_train1 <- performance(rf_pr_train, measure = "auc")@y.values[[1]] 
r_auc_train1    #0.9888


#the second way to calculate training auc
rf_p_train <- as.vector(rfmodel$votes[,2])
rf_pr_train <- prediction(rf_p_train, train$Species);
r_auc_train2 <- performance(rf_pr_train, measure = "auc")@y.values[[1]]
r_auc_train2  #0.9175

Answer 1

要为两个预测函数接收相同的结果，您应该从第一个中排除newdata参数（在 predict 函数的包文档中进行了解释），

rf_p_train <- predict(rfmodel, type="prob")[,2]
rf_pr_train <- prediction(rf_p_train, train$Species)
r_auc_train1 <- performance(rf_pr_train, measure = "auc")@y.values[[1]] 
r_auc_train1

返回，

[1] 0.8655172

第二个函数返回OOB投票，如 randomForest 函数的包文档中所述，

rf_p_train <- as.vector(rfmodel$votes[,2])
rf_pr_train <- prediction(rf_p_train, train$Species);
r_auc_train2 <- performance(rf_pr_train, measure = "auc")@y.values[[1]]
r_auc_train2

返回（相同的结果），

[1] 0.8655172

如何计算R

1 个答案: