在使用parallel :: dopar后,我在R中遇到间歇性错误消息。
我在开始一个新的会话后使用了一次多巴循环,一切正常。我的脚本运行,获取语料库,转换它并输出矩阵。
如果我rm(list = ls())
和closeAllConnections()
我之后无法执行任何操作而不会出现内存错误。
尝试使用稍微改变的参数再次运行该函数给"Error in mcfork() : unable to fork, possible reason: Cannot allocate memory"
但是如果我尝试做其他任何事情我得到一个错误。我尝试输入sessionInfo()
"Error in system(paste(which, shQuote(names[i])), intern = TRUE, ignore.stderr = TRUE) : cannot popen '/usr/bin/which 'uname' 2>/dev/null', probable reason 'Cannot allocate memory'"
我尝试在R studio(托管的R studio工具> shell)中打开shell,它提供了弹出窗口"无法分配内存"。
我尝试在我的多普勒循环后输入stopImplicitCluster()
以及closeAllConnections()
我不知道下一步该去哪儿?这听起来对任何人都很熟悉吗?
我注意到在终端顶部> 1我看到每个核心所有核心都处于100%睡眠状态,但我不确定这意味着什么。这是一个屏幕截图:
不确定提供哪些其他信息?
这是一个在新的会话中运行完美的脚本,然后似乎让我没有记忆。
clean_corpus <- function(corpus, n = 1000) { # n is length of each peice in parallel processing
# split the corpus into pieces for looping to get around memory issues with transformation
nr <- length(corpus)
pieces <- split(corpus, rep(1:ceiling(nr/n), each=n, length.out=nr))
lenp <- length(pieces)
rm(corpus) # save memory
# save pieces to rds files since not enough RAM
tmpfile <- tempfile()
for (i in seq_len(lenp)) {
saveRDS(pieces[[i]],
paste0(tmpfile, i, ".rds"))
}
rm(pieces) # save memory since now these are saved in tmp rds files
# doparallel
registerDoParallel(cores = 12)
pieces <- foreach(i = seq_len(lenp)) %dopar% {
# update spelling
piece <- readRDS(paste0(tmpfile, i, ".rds"))
# spelling update based on lut
piece <- tm_map(piece, function(i) stringi_spelling_update(i, spellingdoc))
# regular transformations
piece <- tm_map(piece, removeNumbers)
piece <- tm_map(piece, content_transformer(removePunctuation), preserve_intra_word_dashes = T)
piece <- tm_map(piece, content_transformer(function(x, ...)
qdap::rm_stopwords(x, stopwords = tm::stopwords("english"), separate = F)))
saveRDS(piece, paste0(tmpfile, i, ".rds"))
return(1) # hack to get dopar to forget the piece to save memory since now saved to rds
}
stopImplicitCluster() # I added this but according to documentation I don;t think I need to since implicit clusters are closed automatically by doparallel?
# combine the pieces back into one corpus
corpus <- list()
corpus <- foreach(i = seq_len(lenp)) %do% {
corpus[[i]] <- readRDS(paste0(tmpfile, i, ".rds"))
}
corpus <- do.call(function(...) c(..., recursive = TRUE), corpus)
return(corpus)
} # end clean_corpus function