运行一个并行循环,该循环从模拟矢量的最近点开始

时间:2018-12-14 17:08:53

标签: r parallel-processing

我正在运行一系列模拟。这一系列的模拟需要许多小时才能完成。有时计算机崩溃,或者我需要停止模拟才能对计算机执行其他操作。为了解决这个问题,我编写了一些代码来检查将模拟结果保存到的文件夹,并按照最后保存的模拟开始模拟。这样休息就没关系了。

但是,由于该过程很慢,我想对其进行并行化,同时仍保持在上一次完整模拟时停止和重新启动该过程的能力。

下面是一个单核过程的示例。

library(foreach)
library(doMC)
library(dplyr)
library(stringr)
registerDoMC(4)

#Generate seeds

folder <- "DeleteMe"

  if(!file.exists(folder)){
    dir.create(folder)
  }

set.seed(12)
RandNum <- data.frame(ID = 1:100, Random = sample(1:1000, 100))

for (i in 1:nrow(RandNum)) {


  CurrentSims <- list.files(folder) %>%
  str_replace_all(., "\\D", "" )

  NeededSims <- RandNum$ID %>%
  str_replace_all(., pattern ="\\D", "" ) %>%
  as.numeric()

  #The simulations still required in current run
  NeededSims2 <- NeededSims[!(NeededSims %in% CurrentSims)]

  #minimum ID numnber of remaining sims
  NextSim <- min(NeededSims2)

if (NextSim==Inf) break #Stops function making error on last iteration

  #Replace this with some long and complicated process that saves to the target folder
  RandNum %>%
    filter(ID == NextSim) %>%
  saveRDS(., file.path(folder, paste0("X", NextSim, ".rds")))

  print(i)
}

我制作并行版本的尝试似乎实际上只运行了整整4次,它也会抛出并出错

registerDoMC(4)
foreach (i = 1:nrow(RandNum)) %dopar% {


  CurrentSims <- list.files(folder) %>%
  str_replace_all(., "\\D", "" )

  NeededSims <- RandNum$ID %>%
  str_replace_all(., pattern ="\\D", "" ) %>%
  as.numeric()

  #The simulations still required in current run
  NeededSims2 <- NeededSims[!(NeededSims %in% CurrentSims)]

  #minimum ID numnber of remaining sims
  NextSim <- min(NeededSims2)

if (NextSim==Inf) break #Stops function making error on last iteration

  RandNum %>%
    filter(ID == NextSim) %>%
  saveRDS(., file.path(folder, paste0("X", NextSim, ".rds")))

  print(i)
}

我如何并行化此过程,同时仍然能够停止它,然后重新启动而不会出现问题?

0 个答案:

没有答案