Question

我想比较两个文件中的每一个列。这些文件包含在一个文件夹中，按比较顺序列出，例如。

File_1a
File_1b
File_2a
File_2b
File_3a
File_3b

我想执行一个功能，比较两个文件中的每一个的单个列，然后输出一个数字。我保证你的代码工作正常并不重要。对于每次比较，我想绘制数字（这也可以正常工作）

这是我到目前为止所做的，但我坚持如何浏览文件夹中的所有文件，以及如何保持输出，以便我可以绘制它。提前致谢

df <- read.delim(file.choose(),header=TRUE)
df2 <- read.delim(file.choose(),header=TRUE)
View(df)

total <- merge(df,df2,by="Start")
total[,5][total[,5] == "2"] <- "d"
total[,9][total[,9] == "2"] <- "d"
View(total)
total[,5][total[,5] < 2 & total[,5] !="d"] <- "l"
total[,9][total[,9] < 2 & total[,9] !="d"] <- "l"
View(total)
total[,5][total[,5] > 2 & total[,5] !="l" & total[,5] !="d"] <- "g"
total[,9][total[,9] > 2 & total[,9] !="l" & total[,9] !="d"] <- "g"
View(total)
total$agree <- ifelse((total[,5] == total[,9]),"agree","disagree") 
View(total)

print(sum(total$agree == "agree")/nrow(total)*100)
print(sum(total$agree == "disagree")/nrow(total)*100)

所有文件中应采用相同格式的示例数据集是：

Chromosome Start rt med CN
          1              1   2    4     2
          1             10  1    2     3
         10             1   1    3     2

我希望比较在上面编号的连续文件对之间。

Answer 1

如果您要在每个same个文件的pair列中执行操作，即（File_1a和File_1b，File_2a和{{1你可以这样做（我只是复制/粘贴代码，因为你提到它运行良好。如果你展示了几行数据集，这些步骤可以简化。）

File_2b

更新

在更新的帖子中使用lstN <- lapply(split(files, gsub("[A-Za-z]\\..*", "", files)), function(.files) { total <- Reduce(function(...) merge(..., by='Start'), lapply(.files, function(x) read.table(x, header=TRUE))) total[,5][total[,5]=='2'] <- 'd' total[,9][total[,9]=='2'] <- 'd' total[,5][total[,5] < 2 & total[,5] !="d"] <- "l" total[,9][total[,9] < 2 & total[,9] !="d"] <- "l" total[,5][total[,5] > 2 & total[,5] !="l" & total[,5] !="d"] <- "g" total[,9][total[,9] > 2 & total[,9] !="l" & total[,9] !="d"] <- "g" total$agree <- ifelse((total[,5] == total[,9]),"agree","disagree") print(sum(total$agree == "agree")/nrow(total)*100) print(sum(total$agree == "disagree")/nrow(total)*100) total })的{{1}} format以及data数据集

uploaded

UPDATE2

使用lapply(lstN, head,2) # $File_1 # Start Chromosome.x Ratio.x MedianRatio.x CopyNumber.x Chromosome.y Ratio.y #1 1 1 -1 -1 d 1 0.697902 #2 1 1 -1 -1 d X -1.000000 # MedianRatio.y CopyNumber.y agree #1 1.2794 g disagree #2 -1.0000 d agree #$File_2 # Start Chromosome.x rt.x med.x CN.x Chromosome.y rt.y med.y CN.y agree #1 1 10 1 3 d 5 2 13 d agree #2 1 10 1 3 d 10 1 3 d agree指定getwd()和path

read.delim

数据

我创建了4个文件，即。 lstN <- lapply(split(files, gsub("[A-Za-z]\\..*", "", files)), function(.files) { total <- Reduce(function(...) merge(..., by='Start'), lapply(.files, function(x) read.delim(paste(getwd(), x, sep="/"), header=TRUE, sep=''))) total[,5][total[,5]=='2'] <- 'd' total[,9][total[,9]=='2'] <- 'd' total[,5][total[,5] < 2 & total[,5] !="d"] <- "l" total[,9][total[,9] < 2 & total[,9] !="d"] <- "l" total[,5][total[,5] > 2 & total[,5] !="l" & total[,5] !="d"] <- "g" total[,9][total[,9] > 2 & total[,9] !="l" & total[,9] !="d"] <- "g" total$agree <- ifelse((total[,5] == total[,9]),"agree","disagree") print(sum(total$agree == "agree")/nrow(total)*100) print(sum(total$agree == "disagree")/nrow(total)*100) total })中的File_1a.txt，File_1b.txt，File_2a.txt和File_2b.txt。

working directory

递归地将函数应用于文件夹中的文件对

1 个答案:

更新

UPDATE2

数据