serialize(data,node $ con)中的错误:写入连接时出错

时间:2018-06-27 04:10:20

标签: r parallel-processing

我当前正在尝试运行一些实现并行处理的代码,但是遇到了这个错误:

Error: cannot allocate vector of size 2.1 Gb
Execution halted
Error in serialize(data, node$con) : error writing to connection
Calls: %dopar% ... postNode -> sendData -> sendData.SOCKnode -> serialize
Execution halted
Warning message:
system call failed: Cannot allocate memory 
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> 
unserialize Execution halted

我似乎无法弄清楚为什么存在内存问题。如果我将代码从foreach循环中删除或将foreach更改为for循环,则它可以很好地工作,因此我认为它与代码本身的内容无关,而与并行化有关。此外,似乎在代码开始执行后不久就会抛出错误。任何想法为什么会发生这种情况?看一下我的代码:

list_storer <- list() 

list_storer <- foreach(bt=2:bootreps, .combine=list, .multicombine=TRUE) %dopar% {
ur <- sample.int(nrow(dailydatyr),nrow(dailydatyr),replace=TRUE)
ddyr_boot <- dailydatyr[ur,]
weightvar <- ddyr_boot[,c('ymd1_IssueD','MatD_ymd2')]
weightvar <- abs(weightvar)
x <- DM[ur,]
y<-log(ddyr_boot$dirtyprice2/ddyr_boot$dirtyprice1)
weightings <- rep(1,nrow(ddyr_boot))
weightings <- weightings/(ddyr_boot$datenum2-ddyr_boot$datenum1)

treg <- repeatsales(y,x,maxdailyreturn,weightings,weightvar)


zbtcol <- 0
cnst <- NULL


if (is.null(dums) == FALSE){
  zbtcol <- length(treg)-ncol(x)
  cnst <- paste("tbs(",dums,")_",(middleyr),sep="")
  if (is.null(interactVar) == FALSE){ 
    ninteract <- (length(treg)-ncol(x)-length(dums))/length(dums)
    interact <- unlist(lapply(cnst,function(xla) paste(xla,"*c",c(1:ninteract),sep="")))
    cnst <- c(cnst,interact)}
  }     
}       

tregtotal <- tregtotal + (is.na(treg)==FALSE)
treg[is.na(treg)==TRUE] <- 0

list_storer[[length(list_storer)+1]] <- treg

}

stopImplicitCluster(cl)

1 个答案:

答案 0 :(得分:3)

foreach完成的并行化是空间与时间的权衡。我们可以获得更快的执行速度,但会占用更多的内存。内存使用率较高的原因是,启动了多个R进程,并且每个进程都需要自己的内存来保存计算所需的数据。当前foreach正在使用隐式PSOCK集群。解决此问题的一种方法是使用更少的进程来显式创建集群。多低取决于您拥有的内存量以及每个作业的内存要求:

n <- parallel::detectCores()/2 # experiment!
cl <- parallel::makeCluster(n)
doParallel::registerDoParallel(cl)
<foreach>
parallel::stopCluster(cl)