awk计数多于或少于两个文件列中的匹配数

时间:2014-02-18 17:14:21

标签: bash awk

我希望将$ 1从一个文件匹配到$ 1,然后计算$2(File1) < $2(File2) <$3(File1)之间的匹配数量,并为每场比赛执行此操作

文件1

Chromosome  Start   End Value
chr1    0   121347754   -0.009727287106215954
chr1    144009053   249250621   0.18180939555168152
chr2    0   90278124    -0.0197499617934227
chr2    95387134    243199373   -0.009399870410561562
chr3    0   91000000    -0.015508042648434639
chr3    93541117    198022430   0.011255052872002125
chr4    0   49064792    -0.02086501568555832
chr4    52700771    143350756   0.013872206211090088
chr4    143350756   191154276   -0.004134085960686207

文件2 探测

Chromosome  Start   End Value   Array
chr1    798959  798959  1.0 0
chr1    1048955 1048955 0.0 0
chr1    1158277 1158277 0.0 0
chr1    1314015 1314015 0.5307189226150513  0
chr1    1489928 1489928 0.45127609372138977 0
chr1    1499298 1499298 1.0 0
chr1    1948400 1948400 0.0 0
chr1    2021114 2021114 0.0 0
chr1    2056735 2056735 0.0 0

所以输出将是:

$1(matching both File 1 and 2) $2(File1) $3(File1) $4(number of matches)

输出

Chromosome  Start   End Probes
chr1    0   121347754   238
chr1    144009053   249250621   590
chr2    0   90278124    321

我一直试图用awk做这个并且它不起作用!

这是我已经得到的

awk 'FNR==NR{a[$1]=$1 FS $2;next}{ print $1[File1] "\t" $2[File1] "\t" $3[File1] "\t" $2[File1] < $2[File2] < $3[File1]  }' File1 File2

1 个答案:

答案 0 :(得分:1)

使用awk的另一种方式

awk 'BEGIN {print "Chromosome  Start   End Probes"}
NR==FNR{a[$1]=a[$1]==""?$2:a[$1] FS $2;next}
{ delete c 
  split(a[$1],b,FS)
  for (i=1;i<=length(b);i++)
       if (b[i]>$2&&b[i]<$3) c[$1]++
  if (c[$1])print $1,$2,$3,c[$1]
}' file2 file1

解释

  • BEGIN {print "Chromosome Start End Probes"}打印标题
  • NR==FNR{a[$1]=a[$1]==""?$2:a[$1] FS $2;next},读取file2,将值附加到数组a,键为$ 1
  • split(a[$1],b,FS),将数组[$ 1]值拆分为数组b
  • if (b[i]>$2&&b[i]<$3) c[$1]++算上它