我在一个文件夹中有超过500个文件(df1),我想通过将它与参考表(nf1)合并来创建新文件。
data[1] <- Composite.REF Call Confidence
SNP_A-2131660 2 0.0053
SNP_A-1967418 2 0.0075
SNP_A-1969580 2 0.0042
SNP_A-4263484 2 0.0052
nf1 <-
Composite.REF dbSNP.RS.ID Chromosome Physical.Position Allele.A Allele.B Gene region
SNP_A-2131660 rs4147951 2 66943738 A G ABCA8 intron
SNP_A-1967418 rs2022235 2 14326088 C T --- downstream
SNP_A-1969580 rs6425720 2 31709555 A G NKAIN1 intron
SNP_A-4263484 rs12997193 2 106584554 A C --- upstream
finalFile <-
Composite.REF dbSNP.RS.ID Chromosome Physical.Position Allele.A Allele.B Gene region data[1]
SNP_A-1969580 rs6425720 2 31709555 A G NKAIN1 intron 0.042
listFiles <- list.files(pattern = "data.txt$",recursive=T) # list all the files with extension data.txt
for (i in 1:length(listFiles)){
data<-read.table(file=paste(listFiles[i]), sep="\t", skip=1, header=T)
dataF <-data[data$Confidence < 0.05,] #add a filter
finalFile <- merge(dataF, nf1, by = "Composite.Element.REF") #merge 2 data based on common column
write.table(finalFile, gsub("data.txt", "data_new.txt" ,listFiles[i]), sep = "\t", row.names=F, quote=F) #save the output
}
这需要花费很多时间才能完成,因为它一次循环一个样本。我想知道这份工作是否更优雅。
答案 0 :(得分:1)
如果没有一些数据,回答这个问题是非常困难的,但是 plyr 包可以让你做这样的事情:
library(plyr)
data.main <- adply(listFiles, 1, read.table, sep="\t", skip=1, header=T) # load all files
data.main <- subset(data.main, Confidence < 0.05) # reduce data by cutoff value
data.main <- merge(data.main, nf1, by = 'Composite.Element.REF') # merge data sets
# write out all files
d_ply(data.main, .(.id), function(x) {
file.name <- sprintf('new data %i.txt', listFiles[x$.id[1]])
write.table(x, file.name, sep = "\t", row.names=F, quote=F) #save the output
})