我正在编写一个R代码来计算/选择两列中绝对差值的行,这两行的值小于某个值(比如0.1)。它读取两个文件和两个列号,根据这些文件进行计算。
args<-commandArgs(TRUE)
f1<-args[1]
f2<-args[2]
col1<-args[3]
col2<-args[4]
if (length(args) <4) {
stop("\n\nUsage: file1 file2 column1 column2 >output\n\n")
}
pearsoncor<-function(in1,in2,x,y){
# Read in data
file1<-read.table(in1,header=F)
file2<-read.table(in2,header=F)
col1<-as.integer(x)
col2<-as.integer(y)
# Calculate pearson correlation
cor_r <- which(file1[,col1] != '.' & file2[,col2] != '.')
file1.filter<-file1[cor_r,col1]
file2.filter<-file2[cor_r,col2]
x=cor.test(c(file1.filter),c(file2.filter),method="pearson")
fname=paste(basename(in1), " and ", basename(in2),sep="")
print(fname)
print(x)
conc_c <- 0
for (i in 1:length(cor_r))
{
dif<-abs(file1.filter[i,] - file2.filter[i,])
if (dif <0.1){
conc_c = conc_c +1
}
}
print(conc_c)
}
pearsoncor(f1,f2,col1,col2)
测试文件:两个以制表符分隔的测试文件。
==> t1.txt <==
chr1 10468 10470 . 1 + chr1 10468 10470 0.762 2
chr1 10470 10472 . 2 + chr1 10470 10472 0.738 2
chr1 10483 10485 . 3 + chr1 10483 10485 0.865 2
chr1 10488 10490 . 4 + chr1 10488 10490 0.825 2
chr1 10492 10494 . 5 + chr1 10492 10494 0.894 2
chr1 10496 10498 . 6 + chr1 10496 10498 0.859 2
chr1 10524 10526 . 7 + chr1 10524 10526 0.954 2
chr1 10541 10543 . 8 + chr1 10541 10543 0.876 2
chr1 10562 10564 . 9 + chr1 10562 10564 0.829 2
chr1 10570 10572 . 10 + chr1 10570 10572 . 2
==> t2.txt <==
chr1 10468 10470 . 1 + chr1 10468 10470 0.69 2
chr1 10470 10472 . 2 + chr1 10470 10472 0.7 2
chr1 10483 10485 . 3 + chr1 10483 10485 0.911 2
chr1 10488 10490 . 4 + chr1 10488 10490 0.894 2
chr1 10492 10494 . 5 + chr1 10492 10494 0.714 2
chr1 10496 10498 . 6 + chr1 10496 10498 0.857 2
chr1 10524 10526 . 7 + chr1 10524 10526 0.984 2
chr1 10541 10543 . 8 + chr1 10541 10543 0.955 2
chr1 10562 10564 . 9 + chr1 10562 10564 0.981 2
chr1 10570 10572 . 10 + chr1 10570 10572 0.979 2
运行:
Rscript script.r t1.txt t2.txt 10 10
我无法在代码中完成绝对值计算..它给出了这个错误:
Error in `[.default`(file1.filter, i, ) : incorrect number of dimensions
Calls: pearsoncor.jgu -> [ -> [.factor -> NextMethod
Execution halted
此错误消息的第一行也来自此处:
> file1.filter[1,]
Error in `[.default`(file1.filter, 1, ) : incorrect number of dimensions
> dput(head(file1.filter))
c(0.762, 0.738, 0.865, 0.825, 0.894, 0.859)
答案 0 :(得分:2)
file1.filter
和file2.filter
都是向量,因此语法file1.filter[i,]
导致错误,只需使用file1.filter[i]
即可纠正(您还需要使用{ {1}})。
话虽如此,你使用file2.filter[i]
循环所做的只是计算两个向量元素相差不超过0.1的次数:
for
这可以在R:
中的单行中完成conc_c <- 0
for (i in 1:length(cor_r))
{
dif<-abs(file1.filter[i,] - file2.filter[i,])
if (dif <0.1){
conc_c = conc_c +1
}
}
print(conc_c)
基本上print(sum(abs(file1.filter - file2.filter) < 0.1))
返回每个元素是否小于0.1的向量,abs(file1.filter - file2.filter) < 0.1
为每个sum
值加1,为每个TRUE
值加0,有效求和向量中的FALSE
值的数量。