Question

我必须在10200个文本文件上运行此操作：

s[s$POS==sample[tail(which(sample$obs_pval == min(sample$obs_pval)), 1),1],])

然后将每个操作的每个文件的输出写入一个文件，如下所示：

        ID            CHROM      POS
20_49715203_T_C_b37    20      49715203

所以我最终将得到一个文件，上面有10200行。

现在我的代码如下：

fileNames <- lapply(Sys.glob("ENSG*.txt"), read.table)
s=read.table("snpPos", header=TRUE)

for (fileName in fileNames) {

  # read original data:
  sample <- read.table(fileName,
  header = TRUE,
   sep = ",")

  # create new data based on contents of original file:
  allEQTLs <- data.frame(
    File = fileName,
    EQTLs = s[s$POS==sample[tail(which(sample$obs_pval == min(sample$obs_pval)), 1),1],])

  # write new data to separate file:
  write.table(allEQTLs, 
    "EQTLs.txt",
    append = TRUE,
    sep = ",",
    row.names = FALSE,
    col.names = FALSE)
}

现在，我以标准方式进行操作，这需要很多时间。有没有更好/更有效的方法来编写此代码？我还应该提到，每个这些ENSG * .txt文件都至少有4000行。最大的文件有1500万行。

Answer 1

如果大部分时间都是读/写操作，请尝试从data.table包进行fread和fwrite。（您可以使用Rprofiling工具检查后者的情况，例如Rprof函数。）

R以更快的方式对大量文件执行操作

1 个答案: