Question

在与并行程序包交互时，如何理解变量的作用域/传递给函数有困难

library(parallel)

test <- function(a = 1){
  no_cores <- detectCores()-1
  clust <- makeCluster(no_cores)
  result <- parSapply(clust, 1:10, function(x){a + x})
  stopCluster(clust)
  return(result)
}

test()
[1]  4  5  6  7  8  9 10 11 12 13

x = 1
test(x)

Error in checkForRemoteErrors(val) : 
3 nodes produced errors; first error: object 'x' not found

test（）有效，但test（x）无效。当我按如下所示修改函数时，它可以工作。

test <- function(a = 1){
  no_cores <- detectCores()-1
  clust <- makeCluster(no_cores)
  y = a
  result <- parSapply(clust, 1:10, function(x){y + x})
  stopCluster(clust)
  return(result)
}

x = 1
test(x)

有人可以解释内存中发生了什么吗？

Answer 1

这是由于懒惰的评估。直到第一次使用参数Ecto时，函数调用才会对其进行评估。在第一种情况下，群集未知inserted_at，因为尚未在父环境中对其进行评估。您可以通过强制评估来解决此问题：

Answer 2

我最好使用foreach()而不是parSapply()：

library(doParallel)

test <- function(a = 1) {
  no_cores <- detectCores() - 1
  registerDoParallel(clust <- makeCluster(no_cores))
  on.exit(stopCluster(clust), add = TRUE)
  foreach(x = 1:10, .combine = 'c') %dopar% { a + x }
}

使用a时无需强制评估foreach()。而且，您可以根据需要在函数外部注册并行后端。

请参阅有关使用foreach() there的教程（免责声明：我是tuto的作者）。

并行计算时，函数参数中的变量不会传递给集群

2 个答案: