我在MLR软件包中有一个问题,
在通过交叉验证调整了randomforest超参数之后
getLearnerModel(rforest)-将不使用CV,而是整个使用整个数据集,对吗?
#traintask
trainTask <- makeClassifTask(data = trainsample,target = "DIED30", positive="1")
#random forest tuning
rf <- makeLearner("classif.randomForest", predict.type = "prob", par.vals = list(ntree = 1000, mtry = 3))
rf$par.vals <- list( importance = TRUE)
rf_param <- makeParamSet(
makeDiscreteParam("ntree",values= c(500,750, 1000,2000)),
makeIntegerParam("mtry", lower = 1, upper = 15),
makeDiscreteParam("nodesize", values =c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20))
)
rancontrol <- makeTuneControlGrid()
set_cv <- makeResampleDesc("CV",iters = 10L)
rf_tune <- tuneParams(learner = rf, resampling = set_cv, task = trainTask, par.set = rf_param, control = rancontrol, measures = auc)
rf_tune$x
rf.tree <- setHyperPars(rf, par.vals = rf_tune$x)
#train best model
rforest <- train(rf.tree, trainTask)
getLearnerModel(rforest)
#predict
pforest<- predict(rforest,trainTask)
最终,使用射频模型对整个数据进行rforest训练,而不是交叉验证。
有没有什么方法可以在MLR中使用CV进行最终培训?
我打算在外部数据集中验证结果。在外部数据集上运行之前,我应该使用10CV训练模型(不知道如何)还是只使用10CV超参数搜索中找到的参数?
提前感谢您的时间,