Question

我有一个非常庞大的列表（ huge_list ）。为列表的每个值调用一个函数（ inner_fun ）。 Inner_fun大约需要0.5秒。inner_fun的输出是一个大小为3的简单数字向量。我试图将这种方法并行化。经过许多文章后，有人提到，当并行函数非常快时，最好分成块。所以我根据核心进行划分。但是没有时间受益。我无法理解这里的概念。任何人都可以对此提供一些见解。我主要担心的是，如果我做错了代码。我没有在这里发布确切的代码。但我试图尽可能地复制

几点观察：

dummy_fun和dummy_fun2需要大约10小时，并行保持为 11
没有平行，这大概是20小时。
with parallel = 2，它在15小时内完成
我使用的是12核，60 GB RAM，ubuntu机器

制作群集的代码

no_of_clusters<-detectCores()-1
cl <- makeCluster(no_of_clusters) ; registerDoParallel(cl) ; 
clusterExport(cl, varlist=c("arg1","arg2","inner_fun"))

没有块的功能

dummy_fun<-function(arg1,arg2,huge_list){
  g <- foreach (i = 1: length(huge_list),.combine=rbind,
                .multicombine=TRUE) %dopar% {
                    inner_fun(i,arg1,arg2,huge_list[i])
                }
    return(g)
}

**带有块的函数**

dummy_fun2<-function(arg1,arg2,huge_list){
  il<-1:length(huge_list)
  il2<-split(il, ceiling(seq_along(il)/(length(il)/(detectCores()-1))))
  g <- foreach ( i= il2 , .combine=rbind,.multicombine=TRUE) %dopar% {
  ab1<-lapply(i,function(li) 
           { 
            inner_fun(i,arg1,arg2,huge_list(i))
           }
          )
   do.call(rbind,ab1)
}
   return(g)
}

Answer 1

你错了。它不是将索引划分为长度为no_of_clusters的块，而是将它们划分为no_of_clusters个块。

试试这个：

dummy_fun2 <- function(arg1, arg2, huge_list, inner_fun, ncores) {

  cl <- parallel::makeCluster(ncores)
  doParallel::registerDoParallel(cl)
  on.exit(parallel::stopCluster(cl), add = TRUE)

  L <- length(huge_list)
  inds <- split(seq_len(L), sort(rep_len(seq_len(NCORES), L)))

  foreach(l = seq_along(inds), .combine = rbind) %dopar% {
    ab1 <- lapply(inds[[l]], function(i) {
      inner_fun(i, arg1, arg2, huge_list[i])
    })
    do.call(rbind, ab1)
  }
}

进一步评论：

使用计算机上一半以上的核心通常没用。
选项.multicombine会自动与rbind一起使用。但.maxcombine非常重要（需要超过100个）。在这里，我们使用lapply作为顺序部分，因此这句话并不重要。
使用foreach时有很多导出没用，它已经从dummy_fun2的环境中导出了必要的内容。
您确定要使用huge_list[i]（获取一个元素的列表）而不是huge_list[[i]]（获取列表的第i个元素）吗？
< / LI>

块中的并行处理不会带来任何性能优势

1 个答案: