R foreach循环在HPC环境中耗尽内存

时间:2018-12-19 04:22:08

标签: r memory hpc r-raster parallel-foreach

我正在使用R中的foreach包来处理栅格文件。

下面的R代码在适用于8核处理器时在本地(在Windows上)可以正常工作,但是在具有48核的HPC环境中用尽了内存。与我的本地存储盒(32 GB)相比,HPC环境具有更多的可用内存(所有48个内核中的2 TB),因此这不是限制因素。

随着foreach循环的进行,发生了内存蠕变。它很慢,但足以最终耗尽内存。

我尝试将并行包切换到doMCdoSNOW,在每次迭代结束时添加大量垃圾回收调用和rm()大对象,摆弄数量核心,并立即删除所有临时文件。

关于什么可能导致我的记忆问题的任何想法?

# Set Java memory maximum
options(java.parameters = "-Xmx39g")

library(sp)
library(raster)
library(dismo)
library(foreach)
library(doParallel)
library(rgdal)
library(rJava)

# Set directories  
relPath <- "E:/BIEN_Cactaceae/"
bufferDir <- "Data/Buffers"
climDir <- "Data/FutureClimate/"
outDir <- "Analyses/FutureRanges/"
modelDir <- "Analyses/MaxEnt/"
outfileDir <- "OutFiles/"
tempDir <- "E:/Tmp/"

# Set directory for raster temporary files
rasterOptions(tmpdir = tempDir)

# Search for models
models <- list.files(path = paste0(relPath, modelDir), pattern = "rda$")

# Set up cluster
cl <- makeCluster(48, type = "FORK", outfile = paste0(relPath, outfileDir, "predictFuture.txt"))
registerDoParallel(cl)

# Loop through species and predict current ranges
foreach(i = 1:length(models),
        .packages = c("sp", "raster", "dismo", "rgdal", "rJava"),
        .inorder = F) %dopar% {
  # Get taxon
  taxon <- strsplit(models[i], ".", fixed = T)[[1]][1]
  # Get buffer
  tmpBuffer <- readOGR(dsn = paste0(relPath, bufferDir), layer = paste0(taxon, "_buff"), verbose = F)
  # Get scenarios
  scenarios <- list.files(path = paste0(relPath, climDir), pattern = "tif$")
  # Get model
  load(paste0(relPath, modelDir, models[i]))
  # Loop over scenarios
  for (j in scenarios) {
    # Get scenario name
    tmpScenarioName <- strsplit(j, ".", fixed = T)[[1]][1]
    # Skip scenario if already processed
    if (!file.exists(paste0(relPath, outDir, taxon, "_", tmpScenarioName, ".tif"))) {
      # Read, crop, mask predictors
      print(paste0(taxon, " - ", tmpScenarioName, ": processing"))
      tmpScenarioStack <- raster::stack(paste0(relPath, climDir, j))
      preds <- raster::crop(tmpScenarioStack, tmpBuffer)
      preds <- raster::mask(preds, tmpBuffer)
      # Rename predictors
      tmpNames <- paste0(taxon, ".", 1:20)
      tmpNames <- gsub("-", ".", tmpNames, fixed = T)
      tmpNames <- gsub(" ", "_", tmpNames, fixed = T)
      names(preds) <- tmpNames
      # Predict with model
      prediction <- dismo::predict(model_all, preds, progress = "")
      # Export predictions
      writeRaster(prediction, paste0(relPath, outDir, taxon, "_", tmpScenarioName, ".tif"))
      removeTmpFiles(h = 2)
    }
  }
}

stopCluster(cl)

0 个答案:

没有答案