我想比较两个文件中的每一个列。这些文件包含在一个文件夹中,按比较顺序列出,例如。
File_1a
File_1b
File_2a
File_2b
File_3a
File_3b
我想执行一个功能,比较两个文件中的每一个的单个列,然后输出一个数字。我保证你的代码工作正常并不重要。对于每次比较,我想绘制数字(这也可以正常工作)
这是我到目前为止所做的,但我坚持如何浏览文件夹中的所有文件,以及如何保持输出,以便我可以绘制它。提前致谢
df <- read.delim(file.choose(),header=TRUE)
df2 <- read.delim(file.choose(),header=TRUE)
View(df)
total <- merge(df,df2,by="Start")
total[,5][total[,5] == "2"] <- "d"
total[,9][total[,9] == "2"] <- "d"
View(total)
total[,5][total[,5] < 2 & total[,5] !="d"] <- "l"
total[,9][total[,9] < 2 & total[,9] !="d"] <- "l"
View(total)
total[,5][total[,5] > 2 & total[,5] !="l" & total[,5] !="d"] <- "g"
total[,9][total[,9] > 2 & total[,9] !="l" & total[,9] !="d"] <- "g"
View(total)
total$agree <- ifelse((total[,5] == total[,9]),"agree","disagree")
View(total)
print(sum(total$agree == "agree")/nrow(total)*100)
print(sum(total$agree == "disagree")/nrow(total)*100)
所有文件中应采用相同格式的示例数据集是:
Chromosome Start rt med CN
1 1 2 4 2
1 10 1 2 3
10 1 1 3 2
我希望比较在上面编号的连续文件对之间。
答案 0 :(得分:0)
如果您要在每个same
个文件的pair
列中执行操作,即(File_1a
和File_1b
,File_2a
和{{1你可以这样做(我只是复制/粘贴代码,因为你提到它运行良好。如果你展示了几行数据集,这些步骤可以简化。)
File_2b
在更新的帖子中使用lstN <- lapply(split(files, gsub("[A-Za-z]\\..*", "", files)),
function(.files) {
total <- Reduce(function(...) merge(..., by='Start'),
lapply(.files, function(x) read.table(x, header=TRUE)))
total[,5][total[,5]=='2'] <- 'd'
total[,9][total[,9]=='2'] <- 'd'
total[,5][total[,5] < 2 & total[,5] !="d"] <- "l"
total[,9][total[,9] < 2 & total[,9] !="d"] <- "l"
total[,5][total[,5] > 2 & total[,5] !="l" & total[,5] !="d"] <- "g"
total[,9][total[,9] > 2 & total[,9] !="l" & total[,9] !="d"] <- "g"
total$agree <- ifelse((total[,5] == total[,9]),"agree","disagree")
print(sum(total$agree == "agree")/nrow(total)*100)
print(sum(total$agree == "disagree")/nrow(total)*100)
total
})
的{{1}} format
以及data
数据集
uploaded
使用 lapply(lstN, head,2)
# $File_1
# Start Chromosome.x Ratio.x MedianRatio.x CopyNumber.x Chromosome.y Ratio.y
#1 1 1 -1 -1 d 1 0.697902
#2 1 1 -1 -1 d X -1.000000
# MedianRatio.y CopyNumber.y agree
#1 1.2794 g disagree
#2 -1.0000 d agree
#$File_2
# Start Chromosome.x rt.x med.x CN.x Chromosome.y rt.y med.y CN.y agree
#1 1 10 1 3 d 5 2 13 d agree
#2 1 10 1 3 d 10 1 3 d agree
指定getwd()
和path
read.delim
我创建了4个文件,即。 lstN <- lapply(split(files, gsub("[A-Za-z]\\..*", "", files)),
function(.files) {
total <- Reduce(function(...) merge(..., by='Start'),
lapply(.files, function(x) read.delim(paste(getwd(),
x, sep="/"), header=TRUE, sep='')))
total[,5][total[,5]=='2'] <- 'd'
total[,9][total[,9]=='2'] <- 'd'
total[,5][total[,5] < 2 & total[,5] !="d"] <- "l"
total[,9][total[,9] < 2 & total[,9] !="d"] <- "l"
total[,5][total[,5] > 2 & total[,5] !="l" & total[,5] !="d"] <- "g"
total[,9][total[,9] > 2 & total[,9] !="l" & total[,9] !="d"] <- "g"
total$agree <- ifelse((total[,5] == total[,9]),"agree","disagree")
print(sum(total$agree == "agree")/nrow(total)*100)
print(sum(total$agree == "disagree")/nrow(total)*100)
total
})
中的File_1a.txt
,File_1b.txt
,File_2a.txt
和File_2b.txt
。
working directory