awk:测试列中的值是否小于或大于其他值并进行联合

时间:2013-09-20 19:40:44

标签: awk compare union

我想使用awk命令来组合这两个数据集。我想获得fileA。$ 1和fileB。$ 1相同的所有行,fileA。$ 4和fileA。$ 5的averagae在fileB。$ 2和fileB。$ 3之间。 (fileA。$ 1 = fileB。$ 1 AND fileB。$ 2< average(fileA。$ 4 + fileA。$ 5)< fileB。$ 3)。任何人都可以为此做好准备吗?

fileA                           
chr1    Mot TF  500 700 0.9893  target1 600
chr1    Mot TF  100 300 0.9893  target1 200
chr1    Mot TF  1000    2000    0.9893  target1 1500
chr2    Mot TF  500 700 0.9502  target2 600

fileB       
chr1    500 1000
chr1    400 800
chr1    100 800
chr3    100 500

desired result                              
chr1    500 1000    chr1    Mot TF  500 700 0.9893  target1 600
chr1    400 800 chr1    Mot TF  500 700 0.9893  target1 600
chr1    100 800 chr1    Mot TF  500 700 0.9893  target1 600
chr1    100 800 chr1    Mot TF  100 300 0.9893  target1 200

2 个答案:

答案 0 :(得分:1)

#!/usr/bin/awk -f

BEGIN {
    FS = OFS = "\t"
}
NR == FNR {
    a0[NR] = $0
    a1[NR] = $1
    av[NR] = ($4 + $5) / 2
    next
}
{
    for (i = 1; i in a0; ++i) {
        if (a1[i] == $1 && av[i] > $2 && av[i] < $3) {
            print $0, a0[i]
        }
    }
}

使用以下命令运行:

awk -f script.awk fileA fileB

输出:

chr1    500 1000    chr1    Mot TF  500 700 0.9893  target1 600
chr1    400 800 chr1    Mot TF  500 700 0.9893  target1 600
chr1    100 800 chr1    Mot TF  500 700 0.9893  target1 600
chr1    100 800 chr1    Mot TF  100 300 0.9893  target1 200

答案 1 :(得分:0)

如果您对输出格式很灵活:

join fileB fileA | awk '$2 < $NF && $NF < $3' 
chr1 500 1000 Mot TF 500 700 0.9893 target1 600
chr1 400 800 Mot TF 500 700 0.9893 target1 600
chr1 100 800 Mot TF 500 700 0.9893 target1 600
chr1 100 800 Mot TF 100 300 0.9893 target1 200

join不会两次打印连接列。我假设fileA的最后一个字段已经是平均值。

否则

awk -v OFS='\t' '
    NR==FNR {f1[$0] = $1; min[$0] = $2; max[$0] = $3; next}
    {
        avg=($4+$5)/2
        for (b in f1) {
            if ($1 == f1[b] && min[b] < avg && avg < max[b]) {
                print b, $0
            }
        }
    }
' fileB fileA
chr1    100 800 chr1    Mot TF  500 700 0.9893  target1 600
chr1    500 1000    chr1    Mot TF  500 700 0.9893  target1 600
chr1    400 800 chr1    Mot TF  500 700 0.9893  target1 600
chr1    100 800 chr1    Mot TF  100 300 0.9893  target1 200