我希望将$ 1从一个文件匹配到$ 1,然后计算$2(File1) < $2(File2) <$3(File1)
之间的匹配数量,并为每场比赛执行此操作
文件1 段
Chromosome Start End Value
chr1 0 121347754 -0.009727287106215954
chr1 144009053 249250621 0.18180939555168152
chr2 0 90278124 -0.0197499617934227
chr2 95387134 243199373 -0.009399870410561562
chr3 0 91000000 -0.015508042648434639
chr3 93541117 198022430 0.011255052872002125
chr4 0 49064792 -0.02086501568555832
chr4 52700771 143350756 0.013872206211090088
chr4 143350756 191154276 -0.004134085960686207
文件2 探测
Chromosome Start End Value Array
chr1 798959 798959 1.0 0
chr1 1048955 1048955 0.0 0
chr1 1158277 1158277 0.0 0
chr1 1314015 1314015 0.5307189226150513 0
chr1 1489928 1489928 0.45127609372138977 0
chr1 1499298 1499298 1.0 0
chr1 1948400 1948400 0.0 0
chr1 2021114 2021114 0.0 0
chr1 2056735 2056735 0.0 0
所以输出将是:
$1(matching both File 1 and 2) $2(File1) $3(File1) $4(number of matches)
输出
Chromosome Start End Probes
chr1 0 121347754 238
chr1 144009053 249250621 590
chr2 0 90278124 321
我一直试图用awk做这个并且它不起作用!
这是我已经得到的
awk 'FNR==NR{a[$1]=$1 FS $2;next}{ print $1[File1] "\t" $2[File1] "\t" $3[File1] "\t" $2[File1] < $2[File2] < $3[File1] }' File1 File2
答案 0 :(得分:1)
使用awk的另一种方式
awk 'BEGIN {print "Chromosome Start End Probes"}
NR==FNR{a[$1]=a[$1]==""?$2:a[$1] FS $2;next}
{ delete c
split(a[$1],b,FS)
for (i=1;i<=length(b);i++)
if (b[i]>$2&&b[i]<$3) c[$1]++
if (c[$1])print $1,$2,$3,c[$1]
}' file2 file1
BEGIN {print "Chromosome Start End Probes"}
打印标题NR==FNR{a[$1]=a[$1]==""?$2:a[$1] FS $2;next}
,读取file2,将值附加到数组a,键为$ 1 split(a[$1],b,FS)
,将数组[$ 1]值拆分为数组b if (b[i]>$2&&b[i]<$3) c[$1]++
算上它