已注册的doParallel集群不适用于train / caret parRF模型

时间:2015-11-07 08:38:02

标签: r parallel-processing random-forest r-caret

我无法让parRF工作,即使parApply之类的其他内容工作正常。

我尝试了makeCluster以及makePSOCKcluster以及类似的一些变体。

不断返回错误task 1 failed - could not find function getDoParWorkers

cores_2_use <- detectCores() - 2
cl          <- makeCluster(cores_2_use, useXDR = F)
clusterSetRNGStream(cl, 9956)
registerDoParallel(cl, cores_2_use)


rf_train <- train(y=y, x=x,
               method='parRF', tuneGrid = data.frame(mtry = ncol(x)), na.action = na.omit,
               trControl=trainControl(method='oob',number=10, allowParallel = TRUE)
               )
Error in { : task 1 failed - "could not find function "getDoParWorkers""

2 个答案:

答案 0 :(得分:4)

我可以重现您的错误消息。解决它需要一点点黑客攻击。我不确定这是一个错误还是别的什么。

但我设法通过复制模型和调整拟合函数来实现它。我在fit函数中添加了<rule name="HTTPS2Admins" enabled="true" stopProcessing="true"> <match url="/ecards/user(.*)" /> <conditions> <add input="{HTTP}" pattern="on" /> </conditions> <action type="Redirect" url="https://{HTTP_HOST}/ecards/user{R:1}" appendQueryString="true" redirectType="Permanent" /> </rule>

奇怪的是,一旦列车模型以新的parRF_Mod作为方法运行,出现错误的原始列车工作没有任何错误。从干净的会话开始,再次出现错误。所以某些事情不应该如此。

require(foreach)

我的sessionInfo:

library(doParallel)

cl = makeCluster(parallel::detectCores()-1, type = "SOCK")
registerDoParallel(cl) 
getDoParWorkers() 


library(caret)
library(randomForest)

y <- mtcars$mpg
x <- mtcars[, -mtcars$mpg ]


parRF_mod <- getModelInfo("parRF", regex = FALSE)[[1]]

parRF_mod$fit <- function (x, y, wts, param, lev, last, classProbs, ...) 
{
  # added the requirement of foreach
  require(foreach)
  workers <- getDoParWorkers()
  theDots <- list(...)
  theDots$ntree <- if (is.null(theDots$ntree)) 
    250
  else theDots$ntree
  theDots$x <- x
  theDots$y <- y
  theDots$mtry <- param$mtry
  theDots$ntree <- ceiling(theDots$ntree/workers)
  out <- foreach(ntree = 1:workers, .combine = combine) %dopar% 
  {
    library(randomForest)
    do.call("randomForest", theDots)
  }
  out$call["x"] <- "x"
  out$call["y"] <- "y"
  out
}

rf_train <- train(y=y, x=x,
                  method=parRF_mod,  tuneGrid = data.frame(mtry = ncol(x)), na.action = na.omit,
                  trControl=trainControl(method='oob',number=10, allowParallel = TRUE)
)


stopcluster(cl)

答案 1 :(得分:4)

更新: Topepo已更新Github上的代码以修复此错误!只需install_github("/topepo/caret/pkg/caret/")

我的先前答案已被弃用

某人from Github也提出了此解决方法:

# parallel
require(caret); library(doParallel); 
cl <- makePSOCKcluster(detectCores()); 
clusterEvalQ(cl, library(foreach)); registerDoParallel(cl)
  y <- mtcars$mpg; x <- mtcars[, -mtcars$mpg];
#--------------------------------------------------------------
  rf_train <- train(y=y, x=x,
              method='parRF', tuneGrid = data.frame(mtry = ncol(x)), na.action = na.omit,
              trControl=trainControl(method='oob',number=10, allowParallel = TRUE)
              )
  rf_train     
#--------------------------------------------------------------
stopCluster(cl);

在运行此版本的代码之前,请务必重新开始。即使在另一次尝试parRF后stopCluster(cl)stopImplicitCluster()之后,这个方法对我来说也不起作用,直到我完全重启R和RStudio。