如何解决R中的交叉验证错误

时间:2019-04-13 13:57:10

标签: r cross-validation r-caret

我正在R中的训练数据集上进行交叉验证。我使用随机森林进行了此操作,现在我正在使用决策树,当我运行它时,这给了我一个错误。我对随机森林进行了10倍和3倍的交叉验证。我正在在线上一堂课,目的是使用R学习数据科学,但遇到了这个困难,我已经尝试了数小时。代码是:

 #cross validation
library(caret)
library(doSNOW) 

set.seed(2348)
cv.10.folds <- createMultiFolds(rf.label, k=10, times = 10)

#check stratification
table(rf.label)
342 / 549

#set up caret's trainControl object per above
ctrl.1 <- trainControl(method = "repeatedcv", number = 10, repeats = 10, index = cv.10.folds)

table(rf.label[cv.10.folds[[33]]])

#set up caret's traincontrol object per above
ctrl.1 <- trainControl(method = "repeatedcv", number = 10, repeats = 10, index = cv.10.folds)

#Set up doSNOW package for multi-core training. This is helpful as we're going
#to be training a lot of trees

cl <- makeCluster(6, types = "SOCK")
registerDoSNOW(c1)

#Set seed for reproducibility and train
set.seed(32384)
rf.4.cv.1 <- train(x = rf.train.4, y = rf.label, method = "rf", tunelength = 3,
                                ntree = 1000, trControl = ctrl.1)

#Shutdown cluster
stopCluster(cl)

#check out results
rf.4.cv.1

#rework with 3 folds
set.seed(37596)
cv.3.folds <- createMultiFolds(rf.label, k=3, times = 10)


#set up caret's trainControl object per above
ctrl.3 <- trainControl(method = "repeatedcv", number = 3, repeats = 10, index = cv.3.folds)



#set up caret's traincontrol object per above
ctrl.3 <- trainControl(method = "repeatedcv", number = 3, repeats = 10,
                       index = cv.3.folds)

#Set up doSNOW package for multi-core training. This is helpful as we're going
#to be training a lot of trees

cl <- makeCluster(6, types = "SOCK")
registerDoSNOW(c1)

#Set seed for reproducibility and train
set.seed(94622)
rf.3.cv.1 <- train(x = rf.train.3, y = rf.label, method = "rf", tunelength = 3,
                   ntree = 1000, trControl = ctrl.3)

#Shutdown cluster
stopCluster(cl)

#check out results
rf.3.cv.1

# Using single Decision tree to better understand what's going on with the features

library(rpart)
library(rpart.plot)

#Using 3 fold cross validation repeated 10 times

#create utility function
rpart.cv <- function(seed, training, labels, ctrl) {
  cl <- makeCluster(6, type = "SOCK")
  registerDoSNOW(cl)
  set.seed(seed)

#Leverage formula interface for training
  rpart.cv <- train(x = training, y = labels, method = "rpart", tunelength =30,
                    trControl = ctrl)

#Shutdown cluster
  stopCluster(cl)

  return (rpart.cv)

}


#Grab features
features <- c("Pclass", "title", "family.size")
rpart.train.1 <- data.combined[1:891, features]

#Run cross validation and check out results
rpart.1.cv.1 <- rpart.cv(94622, rpart.train.1, rf.label, ctrl.3)
rpart.1.cv.1

#Plot
prp(rpart.1.cv.1$finalModel, type = 0, extra =1, under = TRUE)

当我运行它时,我收到错误消息:

Something is wrong; all the Accuracy metric values are missing:
    Accuracy       Kappa    
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :3     NA's   :3    
Error: Stopping
In addition: Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  :

 Show Traceback

 Rerun with Debug
 Error: Stopping > rpart.1.cv.1
Error: object 'rpart.1.cv.1' not found

1 个答案:

答案 0 :(得分:0)

我能够通过以下方法解决它:

   method = "class", parms = list(split = "Gini"), data =data.combined, control = rpart.control(cp)= .2, minsplit =5, minibucket = 5, maxdepth =10)


rpart.cv <- rpart(Survived~ Pclass + title + family.size,
   data = data.combined, method = "class")

  rpart.plot(rpart.cv, cex =.5, extra =4)
``