Question

我使用R中的插入符号包，使用相同的数据集和分层的10倍交叉验证比较了不同的方法（PLS-DA，支持向量机，人工神经网络，随机森林）。数据集有1394个记录。比较结果时，我注意到与其他具有相似敏感性和特异性的模型相比，随机森林的ROC曲线下面积更高。在ROC下，具有相似敏感性和特异性的模型是否总是具有相似的面积？

以下是PLS-DA（ANN和线性SVM得出的结果相似）和随机森林的以下代码：

PLS-DA

Ycalib<-factor(file2[,1121],levels=c("1","0"),labels=c("pregnant","open")) # create the factor vector 
names(Ycalib)<-c("y")
Xcalib<-data.frame(file2[,1126:1663]) # create the data frame with spectral data

set.seed(1001) 
folds<-createFolds(Ycalib,k=10,list = TRUE, returnTrain = TRUE)  # statified folds for cross-validation 

set.seed(1001) 
ctrl<-trainControl(method="repeatedcv",index=folds,classProbs = TRUE,summaryFunction = twoClassSummary,savePredictions = TRUE) 

set.seed(1001)
plsda<-train(x=Xcalib, # spectral data
              y=Ycalib, # factor vector
              method="pls", # pls-da algorithm
              tuneLength=60, # number of components
              trControl=ctrl, # ctrl contained cross-validation option
              preProc=c("center","scale"), # the data are centered and scaled
              metric="ROC") # metric is ROC for 2 classes
plsda

随机森林

Ycalib<-factor(file2[,1121],levels=c("1","0"),labels=c("pregnant","open")) # create the factor vector 
names(Ycalib)<-c("y")
Xcalib<-data.frame(file2[,1126:1663]) # create the data frame with spectral data

mtry<-tuneRF(Xcalib, Ycalib, stepFactor=1) # automatically set the good value for mtry 
mtry


set.seed(1001)
folds<-createFolds(Ycalib,k=10,list = TRUE, returnTrain = TRUE) 

set.seed(1001)
ctrl<-trainControl(method="repeatedcv",index=folds,classProbs = TRUE,summaryFunction = twoClassSummary,savePredictions = TRUE)

customRF <- list(type = "Classification", library = "randomForest", loop = NULL) # code to be able to choose the mtry and ntree using a grid in the train function below)
customRF$parameters <- data.frame(parameter = c("mtry", "ntree"), class = rep("numeric", 2), label = c("mtry", "ntree"))
customRF$grid <- function(x, y, len = NULL, search = "grid") {}
customRF$fit <- function(x, y, wts, param, lev, last, weights, classProbs, ...) {
randomForest(x, y, mtry = param$mtry, ntree=param$ntree, ...)}
customRF$predict <- function(modelFit, newdata, preProc = NULL, submodels = NULL)
predict(modelFit, newdata)
customRF$prob <- function(modelFit, newdata, preProc = NULL, submodels = NULL)
predict(modelFit, newdata, type = "prob")
customRF$sort <- function(x) x[order(x[,1]),]
customRF$levels <- function(x) x$classes
customRF

grid <- expand.grid(mtry = 23, ntree = c(500, 1000) ) # change the mtry according to the results of the tuneRF function above, I can also the ntree

set.seed(1001)
rdforest<-train(x=Xcalib, # spectral data
          y=Ycalib, # factor vector
          method=customRF, # random forest algorithm (ustomRF instead of 'rf' to be able to choose the mtry and ntree using a grid)
          trControl=ctrl, # ctrl contained cross-validation option
          preProc=c("center","scale"), # the data are centered and scaled
          metric="ROC", # metric is ROC for 2 classes. Accuracy is used for multiple classes
          tuneGrid = grid) 
rdforest

以下是结果：

PLS-DA结果

ncomp  ROC        Sens        Spec     
47     0.7382311  0.57758621  0.8119994

随机森林结果

mtry  ntree  ROC        Sens       Spec     
23    500   0.8434449  0.5896552  0.8158085

PLS-DA交叉验证（10倍，重复1次）混淆矩阵

                    Reference
      Prediction pregnant open
        pregnant     24.0 11.0
        open         17.6 47.4

      Accuracy (average) : 0.7145

随机森林交叉验证（10倍，重复1次）混淆矩阵

                Reference
      Prediction pregnant open
        pregnant     25.7 10.0
        open         15.9 48.4

       Accuracy (average) : 0.7403

敏感性和特异性相似，但ROC下的面积不同-使用脱字符号比较不同方法

0 个答案: