插入符拟合功能,带有参数调整和多处理功能

时间:2019-01-28 15:18:22

标签: r multithreading random-forest r-caret

我尝试使用Caret / R调整参数以适应随机森林。它可以工作,但是如果我尝试进行多核处理,它将不再起作用。我使用Windows作为操作系统。

# define parameters to tune
customRF <- list(type = "Classification", library = "randomForest", loop = NULL)
customRF$parameters <- data.frame(parameter = c("mtry", "ntree","nodesize"), class = rep("numeric", 3), label = c("mtry", "ntree","nodesize"))
customRF$grid <- function(x, y, len = NULL, search = "grid") {}
customRF$fit <- function(x, y, wts, param, lev, last, weights, classProbs, ...) {
  randomForest(x, y, mtry = param$mtry, ntree=param$ntree, nodesize=param$nodesize, ...)
}
customRF$predict <- function(modelFit, newdata, preProc = NULL, submodels = NULL)
  predict(modelFit, newdata)
customRF$prob <- function(modelFit, newdata, preProc = NULL, submodels = NULL)
  predict(modelFit, newdata, type = "prob")
customRF$sort <- function(x) x[order(x[,1]),]
customRF$levels <- function(x) x$classes

# define values of parameters to tune

library("doParallel")

Sys.time()
control <- trainControl(method="repeatedcv", number=10,classProbs = TRUE,summaryFunction = twoClassSummary)
tunegrid <- expand.grid(.mtry=c(1,5,11), .ntree=c(20,30,40), .nodesize=c(1, 5, 10))
set.seed(123)

# train model in parallel
registerDoParallel(5)
getDoParWorkers()

Sys.time()
custom <- caret::train(churn2~., data=train05, method=customRF, metric="ROC", tuneGrid=tunegrid,
                       trControl=control,allowParallel= TRUE)
Sys.time()

registerDoSEQ()

通过并行处理,我收到以下消息:

  

警告消息:在nominalTrainWorkflow(x = x,y = y,wts =重量,   info = trainInfo ,:重新采样中缺少值   绩效指标。

我的结果也是错误的。每次尝试均使用相同的AUC。没有多核处理就不是这种情况。

我检查了其他问题,例如Warning message: "missing values in resampled performance measures" in caret train() using rpartparRF on caret not working for more than one core,但不能解决我的问题。

0 个答案:

没有答案