与for循环相比,通过以更好的方式将其与参考数据合并来创建新数据框

时间:2016-10-12 14:18:31

标签: r

我在一个文件夹中有超过500个文件(df1),我想通过将它与参考表(nf1)合并来创建新文件。

data[1] <- Composite.REF Call Confidence
          SNP_A-2131660    2     0.0053
          SNP_A-1967418    2     0.0075
          SNP_A-1969580    2     0.0042
          SNP_A-4263484    2     0.0052
nf1 <- 
    Composite.REF   dbSNP.RS.ID Chromosome      Physical.Position Allele.A  Allele.B   Gene    region
        SNP_A-2131660   rs4147951       2           66943738           A         G        ABCA8   intron 
        SNP_A-1967418   rs2022235       2           14326088           C         T        ---     downstream 
        SNP_A-1969580   rs6425720       2           31709555           A         G        NKAIN1  intron 
        SNP_A-4263484   rs12997193      2           106584554          A         C        ---     upstream 

finalFile <-

        Composite.REF   dbSNP.RS.ID Chromosome      Physical.Position Allele.A  Allele.B   Gene    region    data[1]
         SNP_A-1969580   rs6425720       2           31709555           A         G        NKAIN1  intron      0.042




listFiles <- list.files(pattern = "data.txt$",recursive=T)  # list all the files with extension data.txt

    for (i in 1:length(listFiles)){
        data<-read.table(file=paste(listFiles[i]), sep="\t", skip=1, header=T)
        dataF <-data[data$Confidence < 0.05,] #add a filter
        finalFile <- merge(dataF, nf1, by = "Composite.Element.REF") #merge 2 data based on common column
        write.table(finalFile, gsub("data.txt", "data_new.txt" ,listFiles[i]), sep = "\t", row.names=F, quote=F)   #save the output
        }

这需要花费很多时间才能完成,因为它一次循环一个样本。我想知道这份工作是否更优雅。

1 个答案:

答案 0 :(得分:1)

如果没有一些数据,回答这个问题是非常困难的,但是 plyr 包可以让你做这样的事情:

library(plyr)

data.main <- adply(listFiles, 1, read.table, sep="\t", skip=1, header=T) # load all files
data.main <- subset(data.main, Confidence < 0.05) # reduce data by cutoff value
data.main <- merge(data.main, nf1, by = 'Composite.Element.REF') # merge data sets

# write out all files
d_ply(data.main, .(.id), function(x) {
    file.name <- sprintf('new data %i.txt', listFiles[x$.id[1]])
    write.table(x, file.name, sep = "\t", row.names=F, quote=F)   #save the output
})