Question

我正在尝试学习foreach来平行我的任务

我的for循环看起来像这样：

     # create an empty matrix to store results
     mat <- matrix(-9999, nrow = unique(dat$mun), ncol = 2)

     for(mun in unique(dat$mun)) {

           dat <- read.csv(paste0("data",mun,".csv")
           tot.dat <- sum(dat$x)
           mat[mat[,1]== mun,2] <- tot.dat
     }

unique(dat$mun)的长度为5563.

我想用foreach来解除我的任务。

      library(foreach)
      library(doParallel)

      # number of iterations
      iters <- 5563

      foreach(icount(iters)) %dopar% {
          mun <- unique(dat$mun)[mun] # this is where I cannot figure out how to assing mun so that it read the data for mun

          dat <- read.csv(paste0("data",mun,".csv")
          tot.dat <- sum(dat$x)
          mat[mat[,1]== mun,2] <- tot.dat
        }

Answer 1

这可能是一个解决方案。请注意我在这里使用了Windows，并指定registerDoParallel()使其正常工作。

library(foreach)
library(doParallel)

# number of iterations
iters <- 5563

registerDoParallel()
mun <- unique(dat$mun)

tableList <- foreach(i=1:iters) %dopar% {
  dat <- read.csv(paste0("data",mun[i],".csv")
  tot.dat <- sum(dat$x)
}
unlist(tableList)

基本上，{...}内的任何结果都将存储在列表中。在这种情况下，结果（tot.dat是一个数字）在tableList中编译，通过执行unlist()，我们可以将其转换为矢量以供进一步使用。

{...}内的结果可以是任何内容，单个数字，向量，数据框或任何内容。您的问题的另一种方法是将所有现有数据组合在一起，用适当的源文件标记它，因此中间组件看起来像

library(plyr)
tableAll <- foreach(i=1:iters) %dopar% {
  dat <- read.csv(paste0("data",mun[i],".csv")
  dat$source = mun[i]
}
rbind.fill(tableAll)

然后我们可以将它用于进一步分析。

使用foreach而不是for循环

1 个答案: