Question

我正在尝试使用foreach在我的数据上运行不同的分类器，但它不起作用。事实上，它并没有给我任何回报我的目的是并行化我的过程。这是我的代码的简化：

library(foreach)
library(doParallel)
no_cores <- detectCores() - 1
cl<-makeCluster(no_cores)
registerDoParallel(cl)
registerDoParallel(no_cores)

model_list<-foreach(i = 1:2, 
              .combine = c,.packages=c("e1071","randomeForest"))  %dopar%  
  if (i==1){
    model1<-svm(x = X,y = as.factor(Y),type = "C-classification",probability = T)
  }
  if (i==2){
    mode2<-randomForest(x = X,y = as.factor(Y), ntree=100, norm.votes=FALSE,importance = T)
}

我的并行化方式总体上是正确的吗？非常感谢。

Answer 1

主要问题是你没有用花括号包围foreach循环的主体。因为%dopar%是一个二元运算符，所以你必须小心优先级，这就是为什么我建议总是使用花括号。

此外，您不应将c用作组合功能。由于svm和randomForest返回对象，因此在列表中返回结果的默认行为是合适的。将它们与c组合将为您提供垃圾结果。

最后，两次致电registerDoParallel是没有意义的。它没有伤害，但它会让你的代码混乱。

我建议：

library(doParallel)
no_cores <- detectCores() - 1
registerDoParallel(no_cores)

model_list <- foreach(i = 1:2,
              .packages=c("e1071","randomForest"))  %dopar% {
  if (i==1) {
    svm(x = X,y = as.factor(Y),type = "C-classification",
       probability = T)
  } else {
    randomForest(x = X,y = as.factor(Y), ntree=100, norm.votes=FALSE,
                 importance = T)
  }
}

我还删除了model1和model2的两个不必要的变量赋值。这些变量将无法在主服务器上正确定义，并且模糊了foreach循环的实际工作方式。

使用＆＃34; foreach＆＃34;用于在R中运行不同的分类器

1 个答案: