我正在运行一系列模拟。这一系列的模拟需要许多小时才能完成。有时计算机崩溃,或者我需要停止模拟才能对计算机执行其他操作。为了解决这个问题,我编写了一些代码来检查将模拟结果保存到的文件夹,并按照最后保存的模拟开始模拟。这样休息就没关系了。
但是,由于该过程很慢,我想对其进行并行化,同时仍保持在上一次完整模拟时停止和重新启动该过程的能力。
下面是一个单核过程的示例。
library(foreach)
library(doMC)
library(dplyr)
library(stringr)
registerDoMC(4)
#Generate seeds
folder <- "DeleteMe"
if(!file.exists(folder)){
dir.create(folder)
}
set.seed(12)
RandNum <- data.frame(ID = 1:100, Random = sample(1:1000, 100))
for (i in 1:nrow(RandNum)) {
CurrentSims <- list.files(folder) %>%
str_replace_all(., "\\D", "" )
NeededSims <- RandNum$ID %>%
str_replace_all(., pattern ="\\D", "" ) %>%
as.numeric()
#The simulations still required in current run
NeededSims2 <- NeededSims[!(NeededSims %in% CurrentSims)]
#minimum ID numnber of remaining sims
NextSim <- min(NeededSims2)
if (NextSim==Inf) break #Stops function making error on last iteration
#Replace this with some long and complicated process that saves to the target folder
RandNum %>%
filter(ID == NextSim) %>%
saveRDS(., file.path(folder, paste0("X", NextSim, ".rds")))
print(i)
}
我制作并行版本的尝试似乎实际上只运行了整整4次,它也会抛出并出错
registerDoMC(4)
foreach (i = 1:nrow(RandNum)) %dopar% {
CurrentSims <- list.files(folder) %>%
str_replace_all(., "\\D", "" )
NeededSims <- RandNum$ID %>%
str_replace_all(., pattern ="\\D", "" ) %>%
as.numeric()
#The simulations still required in current run
NeededSims2 <- NeededSims[!(NeededSims %in% CurrentSims)]
#minimum ID numnber of remaining sims
NextSim <- min(NeededSims2)
if (NextSim==Inf) break #Stops function making error on last iteration
RandNum %>%
filter(ID == NextSim) %>%
saveRDS(., file.path(folder, paste0("X", NextSim, ".rds")))
print(i)
}
我如何并行化此过程,同时仍然能够停止它,然后重新启动而不会出现问题?