即使在删除大对象和关闭连接后也无法分配内存

时间:2017-08-24 10:17:31

标签: r doparallel

在使用parallel :: dopar后,我在R中遇到间歇性错误消息。

我在开始一个新的会话后使用了一次多巴循环,一切正常。我的脚本运行,获取语料库,转换它并输出矩阵。

如果我rm(list = ls())closeAllConnections()我之后无法执行任何操作而不会出现内存错误。

尝试使用稍微改变的参数再次运行该函数给"Error in mcfork() : unable to fork, possible reason: Cannot allocate memory"但是如果我尝试做其他任何事情我得到一个错误。我尝试输入sessionInfo()

"Error in system(paste(which, shQuote(names[i])), intern = TRUE, ignore.stderr = TRUE) : cannot popen '/usr/bin/which 'uname' 2>/dev/null', probable reason 'Cannot allocate memory'"

我尝试在R studio(托管的R studio工具> shell)中打开shell,它提供了弹出窗口"无法分配内存"。

我尝试在我的多普勒循环后输入stopImplicitCluster()以及closeAllConnections()

我不知道下一步该去哪儿?这听起来对任何人都很熟悉吗?

我注意到在终端顶部> 1我看到每个核心所有核心都处于100%睡眠状态,但我不确定这意味着什么。这是一个屏幕截图: enter image description here

不确定提供哪些其他信息?

这是一个在新的会话中运行完美的脚本,然后似乎让我没有记忆。

clean_corpus <- function(corpus, n = 1000) { # n is length of each peice in parallel processing

  # split the corpus into pieces for looping to get around memory issues with transformation
  nr <- length(corpus)
  pieces <- split(corpus, rep(1:ceiling(nr/n), each=n, length.out=nr))
  lenp <- length(pieces)

  rm(corpus) # save memory

  # save pieces to rds files since not enough RAM
  tmpfile <- tempfile() 
  for (i in seq_len(lenp)) {
    saveRDS(pieces[[i]],
            paste0(tmpfile, i, ".rds"))
  }

  rm(pieces) # save memory since now these are saved in tmp rds files

  # doparallel
  registerDoParallel(cores = 12)
  pieces <- foreach(i = seq_len(lenp)) %dopar% {
    # update spelling
    piece <- readRDS(paste0(tmpfile, i, ".rds"))
    # spelling update based on lut
    piece <- tm_map(piece, function(i) stringi_spelling_update(i, spellingdoc))
    # regular transformations
    piece <- tm_map(piece, removeNumbers)
    piece <- tm_map(piece, content_transformer(removePunctuation), preserve_intra_word_dashes = T)
    piece <- tm_map(piece, content_transformer(function(x, ...) 
      qdap::rm_stopwords(x, stopwords = tm::stopwords("english"), separate = F)))
    saveRDS(piece, paste0(tmpfile, i, ".rds"))
    return(1) # hack to get dopar to forget the piece to save memory since now saved to rds
  } 

  stopImplicitCluster() # I added this but according to documentation I don;t think I need to since implicit clusters are closed automatically by doparallel?

  # combine the pieces back into one corpus
  corpus <- list()
  corpus <- foreach(i = seq_len(lenp)) %do% {
    corpus[[i]] <- readRDS(paste0(tmpfile, i, ".rds"))
  }
  corpus <- do.call(function(...) c(..., recursive = TRUE), corpus)
  return(corpus)

} # end clean_corpus function

0 个答案:

没有答案