如何在没有集群变量的情况下关闭R并行集群?

时间:2017-01-16 14:57:39

标签: r

使用parallel R包,我可以像这样并行运行:

library(parallel)
cl <- makeCluster(2) # Create a cluster with 2 workers
... # do some parallel stuff
stopCluster(cl)

然而,引用群集的cl变量可能会丢失,例如从失败的函数运行时:

do.something <- function() {
    library(parallel)
    cl <- makeCluster(detectCores())
    parLapply(cl, 1:10, function(x) {
        stop("An error occured")
    })
    stopCluster(cl)
}
do.something()

此处,stopCluster尚未执行。发生这种情况时,我会让工作人员继续运行,如ps所示:

501 53300  9225   0  2:16PM ttys003    0:00.27 /opt/local/Library/Frameworks/R.framework/Resources/bin/exec/R
501 53390     1   0  2:19PM ttys003    0:00.16 /opt/local/Library/Frameworks/R.framework/Resources/bin/exec/R --slave --no-restore -e parallel:::.slaveRSOCK() --args MASTER=localhost PORT=11099 OUT=/dev/null TIMEOUT=2592000 XDR=TRUE
501 53399     1   0  2:19PM ttys003    0:00.16 /opt/local/Library/Frameworks/R.framework/Resources/bin/exec/R --slave --no-restore -e parallel:::.slaveRSOCK() --args MASTER=localhost PORT=11099 OUT=/dev/null TIMEOUT=2592000 XDR=TRUE
501 53408     1   0  2:19PM ttys003    0:00.16 /opt/local/Library/Frameworks/R.framework/Resources/bin/exec/R --slave --no-restore -e parallel:::.slaveRSOCK() --args MASTER=localhost PORT=11099 OUT=/dev/null TIMEOUT=2592000 XDR=TRUE
501 53417     1   0  2:19PM ttys003    0:00.16 /opt/local/Library/Frameworks/R.framework/Resources/bin/exec/R --slave --no-restore -e parallel:::.slaveRSOCK() --args MASTER=localhost PORT=11099 OUT=/dev/null TIMEOUT=2592000 XDR=TRUE

当然,我可以逐个手动kill奴隶,或重新启动R.但有时它可能不实用,例如,如果R的多个实例正在运行自己的池。当cl丢失时,有没有办法阻止他们离开R?人们通常如何处理这种情况?

1 个答案:

答案 0 :(得分:6)

有一些机制可以让代码始终运行,即使出现错误:

purrr(package)

将容易出错的部分包含在trytry块中。然后,您可以检查结果以查看是否存在错误。

tryCatch

do.something <- function() { library(parallel) cl <- makeCluster(detectCores()) result <- try({ parLapply(cl, 1:10, function(x) { stop("An error occured") }) }) if(inherits(result, "try-error")) print("there was an error!") stopCluster(cl) result }

on.exit调用中的代码将始终在函数结束时运行,无论是干净还是由于错误。

on.exit