在R中运行并行计算时如何在worker上设置.libPaths(检查点)

时间:2018-09-11 12:35:32

标签: r foreach parallel-processing checkpoint r-future

我使用检查点包进行可重复的数据分析。有些计算需要很长时间才能计算出来,因此我想并行运行这些计算。 并行运行但未在工作进程上设置检查点时,因此出现错误消息“没有名为xy的软件包” (因为它没有安装在我的默认库目录中)。

我如何确保每个工作人员都使用checkpoint文件夹中的软件包版本?我试图在foreach代码中设置.libPaths,但这似乎不起作用。我还希望全局设置一次checkpoint / libPaths,而不是在每个foreach调用中都设置一次。

另一种选择是更改.Rprofile文件,但我不想这样做。

checkpoint::checkpoint("2018-06-01")

library(foreach)
library(doFuture)
library(future)

doFuture::registerDoFuture()
future::plan("multisession")

l <- .libPaths()

# Code to run in parallel does not make much sense of course but I wanted to keep it simple.
res <- foreach::foreach(
  x = unique(iris$Species),
  lib.path = l
) %dopar% {
  .libPaths(lib.path)
  stringr::str_c(x, "_")
}
  

{中的错误:任务2失败-“没有名为'stringr'的软件包”

1 个答案:

答案 0 :(得分:2)

此处future软件包的作者。

将主R进程的库路径作为全局变量AddressCategory传递并使用libs为每个工作程序设置它就足够了;

.libPaths(libs)

仅供参考,它是未来make it easier to pass down the library path(s) to workers的路线图。

我的详细信息:

## Use CRAN checkpoint from 2018-07-24 to get future (>= 1.9.0) [1],
## otherwise the below stdout won't be relayed back to the master
## R process, but settings .libPaths() does also work in older
## versions of the future package.
## [1] https://cran.microsoft.com/snapshot/2018-07-24/web/packages/future
checkpoint::checkpoint("2018-07-24")
stopifnot(packageVersion("future") >= "1.9.0")

libs <- .libPaths()
print(libs)
### [1] "/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1"
### [2] "/home/hb/.checkpoint/R-3.5.1"                                 
### [3] "/usr/lib/R/library"

library(foreach)

doFuture::registerDoFuture()
future::plan("multisession")

res <- foreach::foreach(x = unique(iris$Species)) %dopar% {
  ## Use the same library paths as the master R session
  .libPaths(libs)

  cat(sprintf("Library paths used by worker (PID %d):\n", Sys.getpid()))
  cat(sprintf(" - %s\n", sQuote(.libPaths())))

  stringr::str_c(x, "_")
}

###  - ‘/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1’
###   - ‘/home/hb/.checkpoint/R-3.5.1’
###   - ‘/usr/lib/R/library’
### Library paths used by worker (PID 9394):
###  - ‘/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1’
###   - ‘/home/hb/.checkpoint/R-3.5.1’
###   - ‘/usr/lib/R/library’
### Library paths used by worker (PID 9412):
###  - ‘/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1’
###   - ‘/home/hb/.checkpoint/R-3.5.1’
###   - ‘/usr/lib/R/library’

str(res)
### List of 3
###  $ : chr "setosa_"
###  $ : chr "versicolor_"
###  $ : chr "virginica_"