Question

我正在使用R中的 randomForest 包为分类问题构建随机森林。 plot.randomForest 函数产生的曲线与当我尝试预测训练数据本身（而不是测试集）时，我会从混淆矩阵中得出。我的直觉是，如果我自己预测训练集，那我会从混淆矩阵中得到误分类率，该混淆矩阵类似于函数 plot.randomForest 产生的曲线。但是，曲线和混淆矩阵告诉我的是不同的事情。我不确定为什么会发生这种情况，但是我的直觉是 plot.randomForest 函数产生的所有曲线都基于袋外误差，这就是为什么它们表示的准确性低于混淆矩阵的原因（这只是一个推测，可能还不正确）。如果有人可以让我知道我在想什么，我将不胜感激。

这里是使用虹膜数据的可重现示例。

    library(datasets)
    library(gmodels)
    library(randomForest)

    data(iris)

    set.seed(123)
    rf.train=randomForest(Species~
                    Sepal.Length+
                    Sepal.Width+
                    Petal.Length+
                    Petal.Width,
                    data=iris,
                    ntree=50,
                    importance=TRUE)

     plot(rf.train, main="Error Rate vs Number of Trees In the Forest")

     predictions=predict(rf.train, newdata = iris)
     mydata_with_predictions=cbind(iris, predictions)

    #Confusion Matrix
    CrossTable(mydata_with_predictions$Species,
            mydata_with_predictions$predictions,
            prop.chisq=F,
            prop.t=F)

R中“ plot.randomForest”函数的混淆矩阵与误差曲线

0 个答案: