Question

在for循环中使用parlapply时遇到内存问题。

以下是我正在尝试做的一个玩具示例。

nosim <- 5 #in reality at least 300
nosteps <- 10 # in reality at least 15
datalist <- datalistgrow <- as.list(rep(10,nosim)) #in reality this contains big datasets
simlist <- list()

for(i in 1:nosteps){
    ncores <- detectCores()
    cl <- makeCluster(ncores-1) #set up nodes
    out <- parLapply(cl,datalistgrow,function(x){
        fit <- lm(x~rep(1:length(x))) # fit a more complex model that takes ~5 seconds to run
         res <- coef(fit)[1] #do a whole bunch of calculations
        return(res)
    })
    stopCluster(cl)
    new <- lapply(datalist,function(x) rnorm(1,x,0.1)) #simulate data for the new time step
    datalistgrow <- lapply(1:nosim,function(x) c(datalistgrow[[x]],new[[x]])) #add new data to the list (in reality the amount of data only increases by about 2%)
    ... # calculate several statistics
    simlist[[i]] <- unlist(out) # put all statistics in a list
}

问题是，经过一段时间后，我耗尽了所有内存，Rstudio停止工作（我的笔记本电脑变得非常慢......）。这是我的内存使用历史的打印屏幕;

所有“山丘”代表一个时间步长，您可以看到群集停止后的急剧下降。我不明白为什么每一个新的时间步都会使用更多的内存，因为parlapply总是做同样的事情，尽管它是在更大（~2％）的数据集上。我希望我后来的“山丘”只会略微变大。

我重写了我的功能，所以我不保存实际的模型拟合（重）但只有小尺寸的派生数量，我看了例如集群类型，垃圾收集等尝试减少内存使用。然而，我无法彻底解决或理解这个问题。

准确地说，我的问题是：

为什么parlapply部分在for循环中的每一步都使用了更多的内存？
我该怎么做才能避免这种情况
他们在R中的一个函数是否会在使用大约95％的内存时停止所有计算，而不是崩溃？

谢谢

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

如何在for循环中进行并行计算时减少内存使用量

0 个答案: