Question

我有一个可重现的脚本来使用群集。它没有经过优化，而是作为一个实用的调酒师来研究集群应用。

任务的简要说明：

有一个可变长度的矢量;
需要以特定方式对其进行转换：

初始载体 - 235416749 ......

预期产出 - 235556779 ......

使用群集的应用程序执行转换（请参阅下面的脚本）。

群集输出（样本）如下：

str（chunkSplit）#initial chunks   清单4   $：int [1：24950] 56583 22166 49905 20040 60870 49899 85589 96478 81119 36474
  $：int [1：25050] 22652 31943 57699 58051 33846 27328 34429 84989 16295 29308
  $：int [1：25050] 42641 69941 86274 86395 7499 62027 91978 55004 73069 1528
  $：int [1：24950] 57401 60632 6284 43612 40011 31096 9494 24453 81221 99553

str（cluster_Output）#updated chunk   清单4   $：int [1：24950] 56583 56583 56583 56583 60870 60870 85589 96478 96478 96478
  $：int [1：25050] 22652 31943 57699 58051 58051 58051 58051 84989 84989 84989
  $：int [1：25050] 42641 69941 86274 86395 86395 86395 91978 91978 91978 91978
  $：int [1：24950] 57401 60632 60632 60632 60632 60632 60632 60632 81221 99553

每个块内部的转换符合上述逻辑。

流程说明

脚本期望以正确的方式生成整体输出（由其他块组成）因此，必须利用前一个的变换结果（第二个块的第一个元素＆gt; =第一个块的最后一个元素）的帐户来变换每个下一个块。

然而，块并没有以这种方式转变。因为变量max_val不会在块之间传输正确的值。

问题：
我想这是因为每个工人都会产生自己的环境。如果您咨询我如何解决集群中的问题（集群内部块之间的数据传输），我将非常感激。

可重复的脚本

# Test cluster apllication
library(parallel)


# the function to be applied in the cluster
dochunks <- function(x) {

  # chunk counter
  step2 <<- step2 + 1
  xxx <- unlist(x)
  x_end <- length(xxx)

  # if it is the 2nd and higher chunks -> the 1st element of the vector
  # assigned as max_val
  if (max_val > xxx[1] & step2 >= 2) xxx[1] <- max_val

  i <- 1

  for (i in 1:x_end) {
    if ((i + 1) <= x_end & xxx[i] > xxx[i + 1]) {
      xxx[i + 1] <- xxx[i]
    }
  }

  max_val <<- xxx[x_end]
  return(xxx)
}


# the max value
size <- 100000
# test vector
myvector <- sample(100000, size, replace = TRUE)
# set up workers to process the vector
workers <- 4
# set up the cluster
cls <- makeCluster(workers)
# split test vector into chunks with the number of workers
chunkSplit <- clusterSplit(cls, myvector)

# max value of each chunk
max_val <<- 0
# temporary vector for each chunk
xxx <<- 0
# counter of the application of the each chunk
step2 <<- 0
# export vars into the cluster
clusterExport(cls, varlist = c("max_val", "step2"), envir = environment())
# run the cluster
cluster_Output <- clusterApply(cls, chunkSplit, dochunks)
# quit the cluster
stopCluster(cls)

# compare initial and updated chunks
str(chunkSplit) # initial chunks
str(cluster_Output) # output chunks

在clusterApply（并行包）中的块之间传递外部值

0 个答案: