Question

我正在读取目录中包含100多个文件的csv文件，然后我正在做一些事情，我有8个内核cpu所以我想在并行模式下完成更快。

我写了一些代码，但它对我不起作用 - （使用linux）

library(data.table)
library(parallel)

# Calculate the number of cores
no_cores <- detectCores() - 1
# Initiate cluster
cl <- makeCluster(no_cores)

processFile <- function(f) {

  # reading file by data.table 
  df <- fread(f,colClasses = c(NA,NA, NA,"NULL", "NULL", "NULL"))

  A <- parLapply(cl,sapply(windows, function(w) {return(numOverlaps(w,df))}))

  stopCluster(cl)
}

files <- dir("/home/shared/", recursive=TRUE, full.names=TRUE, pattern=".*\\.txt$")

# Apply the function to all files.

 result <- sapply(files, processFile)

如您所见，我想在processFile（A）中运行函数，但它不起作用！

如何在并行处理模式下运行该功能？

Answer 1

你有这个概念。您需要传递parLapply文件列表，然后处理它们。匿名函数应该执行处理单个文件的整个过程并返回所需的结果。

我的建议是首先使用常规lapply或sapply进行此项工作，然后启动并行后端，导出所有必需的库和对象。

parLapply(cl, X = files, FUN = function(x, ...) {
  ... code for processing the file
})

如何在R中并行？

1 个答案: