R并行处理错误`checkForRemoteErrors(val)中的错误:6个节点产生了错误;第一个错误:下标超出范围`

时间:2018-09-20 12:17:11

标签: r for-loop parallel-processing parallel.foreach doparallel

我正在学习并行处理作为处理一些巨大数据集的一种方式。

我有一些预定义的变量,如下所示:

CV <- function(mean, sd) {(sd / mean) * 100} 
distThreshold <- 5 # Distance threshold 
CVThreshold <- 20 # CV threshold 

LocalCV <- list()
Num.CV <- list()

然后加载parallel库,将基本变量和库分配给集群:

library(parallel)
clust_cores <- makeCluster(detectCores(logical = T) ) 
clusterExport(clust_cores, c("i","YieldData2rd","CV", "distThreshold", "CVThreshold"))
clusterEvalQ(clust_cores, library(sp))

然后将集群参数clust_cores传递给parSapply

for (i in seq(YieldData2rd)) {
  LocalCV[[i]] = parSapply(clust_cores, X = 1:length(YieldData2rd[[i]]), 
                   FUN = function(pt) {
                     d = spDistsN1(YieldData2rd[[i]], YieldData2rd[[i]][pt,])
                     ret = CV(mean = mean(YieldData2rd[[i]][d < distThreshold, ]$yield), 
                              sd = sd(YieldData2rd[[i]][d < distThreshold, ]$yield))
                     return(ret)
                   }) # calculate CV in the local neighbour 
}

stopCluster(clust_cores) 

然后我得到了Error in checkForRemoteErrors(val) : 6 nodes produced errors; first error: subscript out of boundswarning messages: 1: closing unused connection (<-localhost:11688)

请让我知道如何解决此问题。

对于可重现的示例,我创建了一个大列表对象,该对象在没有并行处理组件的原始for循环中可以正常运行。

library('rgdal')

Yield1 <- data.frame(yield=rnorm(460, mean = 10), x1=rnorm(460, mean = 1843235), x2=rnorm(460,mean = 5802532))
Yield2 <- data.frame(yield=rnorm(408, mean = 10), x1=rnorm(408, mean = 1843235), x2=rnorm(408, mean = 5802532))
Yield3 <- data.frame(yield=rnorm(369, mean = 10), x1=rnorm(369, mean = 1843235), x2=rnorm(369, mean = 5802532))

coordinates(Yield1) <- c('x1', 'x2')
coordinates(Yield2) <- c('x1', 'x2')
coordinates(Yield3) <- c('x1', 'x2')

YieldData2rd <- list(Yield1, Yield2, Yield3)

1 个答案:

答案 0 :(得分:0)

由于@Omry Atia的评论,我开始研究RequestRecipe.ajax.php程序包,并进行了首次尝试。

foreach

它将打印出整个内容,而无需在library(foreach) library(doParallel) #setup parallel backend to use many processors cores=detectCores() clust_cores <- makeCluster(cores[1]-1) #not to overload your computer registerDoParallel(clust_cores) LocalCV = foreach(i = seq(YieldData2rd), .combine=list, .multicombine=TRUE) %dopar% { LocalCV[[i]] = sapply(X = 1:length(YieldData2rd[[i]]), FUN = function(pt) { d = spDistsN1(YieldData2rd[[i]], YieldData2rd[[i]][pt,]) ret = CV(mean = mean(YieldData2rd[[i]][d < distThreshold, ]$yield), sd = sd(YieldData2rd[[i]][d < distThreshold, ]$yield)) return(ret) }) # calculate CV in the local neighbour } stopCluster(clust_cores) 的前面放置LocalCV

它将在一些巨大的数据集上尝试新代码,并查看它能得到多快。

参考:run a for loop in parallel in R