我正在R中的训练数据集上进行交叉验证。我使用随机森林进行了此操作,现在我正在使用决策树,当我运行它时,这给了我一个错误。我对随机森林进行了10倍和3倍的交叉验证。我正在在线上一堂课,目的是使用R学习数据科学,但遇到了这个困难,我已经尝试了数小时。代码是:
#cross validation
library(caret)
library(doSNOW)
set.seed(2348)
cv.10.folds <- createMultiFolds(rf.label, k=10, times = 10)
#check stratification
table(rf.label)
342 / 549
#set up caret's trainControl object per above
ctrl.1 <- trainControl(method = "repeatedcv", number = 10, repeats = 10, index = cv.10.folds)
table(rf.label[cv.10.folds[[33]]])
#set up caret's traincontrol object per above
ctrl.1 <- trainControl(method = "repeatedcv", number = 10, repeats = 10, index = cv.10.folds)
#Set up doSNOW package for multi-core training. This is helpful as we're going
#to be training a lot of trees
cl <- makeCluster(6, types = "SOCK")
registerDoSNOW(c1)
#Set seed for reproducibility and train
set.seed(32384)
rf.4.cv.1 <- train(x = rf.train.4, y = rf.label, method = "rf", tunelength = 3,
ntree = 1000, trControl = ctrl.1)
#Shutdown cluster
stopCluster(cl)
#check out results
rf.4.cv.1
#rework with 3 folds
set.seed(37596)
cv.3.folds <- createMultiFolds(rf.label, k=3, times = 10)
#set up caret's trainControl object per above
ctrl.3 <- trainControl(method = "repeatedcv", number = 3, repeats = 10, index = cv.3.folds)
#set up caret's traincontrol object per above
ctrl.3 <- trainControl(method = "repeatedcv", number = 3, repeats = 10,
index = cv.3.folds)
#Set up doSNOW package for multi-core training. This is helpful as we're going
#to be training a lot of trees
cl <- makeCluster(6, types = "SOCK")
registerDoSNOW(c1)
#Set seed for reproducibility and train
set.seed(94622)
rf.3.cv.1 <- train(x = rf.train.3, y = rf.label, method = "rf", tunelength = 3,
ntree = 1000, trControl = ctrl.3)
#Shutdown cluster
stopCluster(cl)
#check out results
rf.3.cv.1
# Using single Decision tree to better understand what's going on with the features
library(rpart)
library(rpart.plot)
#Using 3 fold cross validation repeated 10 times
#create utility function
rpart.cv <- function(seed, training, labels, ctrl) {
cl <- makeCluster(6, type = "SOCK")
registerDoSNOW(cl)
set.seed(seed)
#Leverage formula interface for training
rpart.cv <- train(x = training, y = labels, method = "rpart", tunelength =30,
trControl = ctrl)
#Shutdown cluster
stopCluster(cl)
return (rpart.cv)
}
#Grab features
features <- c("Pclass", "title", "family.size")
rpart.train.1 <- data.combined[1:891, features]
#Run cross validation and check out results
rpart.1.cv.1 <- rpart.cv(94622, rpart.train.1, rf.label, ctrl.3)
rpart.1.cv.1
#Plot
prp(rpart.1.cv.1$finalModel, type = 0, extra =1, under = TRUE)
当我运行它时,我收到错误消息:
Something is wrong; all the Accuracy metric values are missing:
Accuracy Kappa
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :3 NA's :3
Error: Stopping
In addition: Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
Show Traceback
Rerun with Debug
Error: Stopping > rpart.1.cv.1
Error: object 'rpart.1.cv.1' not found
答案 0 :(得分:0)
我能够通过以下方法解决它:
method = "class", parms = list(split = "Gini"), data =data.combined, control = rpart.control(cp)= .2, minsplit =5, minibucket = 5, maxdepth =10)
rpart.cv <- rpart(Survived~ Pclass + title + family.size,
data = data.combined, method = "class")
rpart.plot(rpart.cv, cex =.5, extra =4)
``