递归地将函数应用于文件夹中的文件对

时间:2014-11-29 23:04:04

标签: r

我想比较两个文件中的每一个列。这些文件包含在一个文件夹中,按比较顺序列出,例如。

File_1a
File_1b
File_2a
File_2b
File_3a
File_3b

我想执行一个功能,比较两个文件中的每一个的单个列,然后输出一个数字。我保证你的代码工作正常并不重要。对于每次比较,我想绘制数字(这也可以正常工作)

这是我到目前为止所做的,但我坚持如何浏览文件夹中的所有文件,以及如何保持输出,以便我可以绘制它。提前致谢

df <- read.delim(file.choose(),header=TRUE)
df2 <- read.delim(file.choose(),header=TRUE)
View(df)

total <- merge(df,df2,by="Start")
total[,5][total[,5] == "2"] <- "d"
total[,9][total[,9] == "2"] <- "d"
View(total)
total[,5][total[,5] < 2 & total[,5] !="d"] <- "l"
total[,9][total[,9] < 2 & total[,9] !="d"] <- "l"
View(total)
total[,5][total[,5] > 2 & total[,5] !="l" & total[,5] !="d"] <- "g"
total[,9][total[,9] > 2 & total[,9] !="l" & total[,9] !="d"] <- "g"
View(total)
total$agree <- ifelse((total[,5] == total[,9]),"agree","disagree") 
View(total)

print(sum(total$agree == "agree")/nrow(total)*100)
print(sum(total$agree == "disagree")/nrow(total)*100)

所有文件中应采用相同格式的示例数据集是:

Chromosome Start rt med CN
          1              1   2    4     2
          1             10  1    2     3
         10             1   1    3     2

我希望比较在上面编号的连续文件对之间。

1 个答案:

答案 0 :(得分:0)

如果您要在每个same个文件的pair列中执行操作,即(File_1aFile_1bFile_2a和{{1你可以这样做(我只是复制/粘贴代码,因为你提到它运行良好。如果你展示了几行数据集,这些步骤可以简化。)

File_2b

更新

在更新的帖子中使用lstN <- lapply(split(files, gsub("[A-Za-z]\\..*", "", files)), function(.files) { total <- Reduce(function(...) merge(..., by='Start'), lapply(.files, function(x) read.table(x, header=TRUE))) total[,5][total[,5]=='2'] <- 'd' total[,9][total[,9]=='2'] <- 'd' total[,5][total[,5] < 2 & total[,5] !="d"] <- "l" total[,9][total[,9] < 2 & total[,9] !="d"] <- "l" total[,5][total[,5] > 2 & total[,5] !="l" & total[,5] !="d"] <- "g" total[,9][total[,9] > 2 & total[,9] !="l" & total[,9] !="d"] <- "g" total$agree <- ifelse((total[,5] == total[,9]),"agree","disagree") print(sum(total$agree == "agree")/nrow(total)*100) print(sum(total$agree == "disagree")/nrow(total)*100) total }) 的{​​{1}} format以及data数据集

uploaded

UPDATE2

使用 lapply(lstN, head,2) # $File_1 # Start Chromosome.x Ratio.x MedianRatio.x CopyNumber.x Chromosome.y Ratio.y #1 1 1 -1 -1 d 1 0.697902 #2 1 1 -1 -1 d X -1.000000 # MedianRatio.y CopyNumber.y agree #1 1.2794 g disagree #2 -1.0000 d agree #$File_2 # Start Chromosome.x rt.x med.x CN.x Chromosome.y rt.y med.y CN.y agree #1 1 10 1 3 d 5 2 13 d agree #2 1 10 1 3 d 10 1 3 d agree 指定getwd()path

read.delim

数据

我创建了4个文件,即。 lstN <- lapply(split(files, gsub("[A-Za-z]\\..*", "", files)), function(.files) { total <- Reduce(function(...) merge(..., by='Start'), lapply(.files, function(x) read.delim(paste(getwd(), x, sep="/"), header=TRUE, sep=''))) total[,5][total[,5]=='2'] <- 'd' total[,9][total[,9]=='2'] <- 'd' total[,5][total[,5] < 2 & total[,5] !="d"] <- "l" total[,9][total[,9] < 2 & total[,9] !="d"] <- "l" total[,5][total[,5] > 2 & total[,5] !="l" & total[,5] !="d"] <- "g" total[,9][total[,9] > 2 & total[,9] !="l" & total[,9] !="d"] <- "g" total$agree <- ifelse((total[,5] == total[,9]),"agree","disagree") print(sum(total$agree == "agree")/nrow(total)*100) print(sum(total$agree == "disagree")/nrow(total)*100) total }) 中的File_1a.txtFile_1b.txtFile_2a.txtFile_2b.txt

working directory