我想使用awk命令来组合这两个数据集。我想获得fileA。$ 1和fileB。$ 1相同的所有行,fileA。$ 4和fileA。$ 5的averagae在fileB。$ 2和fileB。$ 3之间。 (fileA。$ 1 = fileB。$ 1 AND fileB。$ 2< average(fileA。$ 4 + fileA。$ 5)< fileB。$ 3)。任何人都可以为此做好准备吗?
fileA
chr1 Mot TF 500 700 0.9893 target1 600
chr1 Mot TF 100 300 0.9893 target1 200
chr1 Mot TF 1000 2000 0.9893 target1 1500
chr2 Mot TF 500 700 0.9502 target2 600
fileB
chr1 500 1000
chr1 400 800
chr1 100 800
chr3 100 500
desired result
chr1 500 1000 chr1 Mot TF 500 700 0.9893 target1 600
chr1 400 800 chr1 Mot TF 500 700 0.9893 target1 600
chr1 100 800 chr1 Mot TF 500 700 0.9893 target1 600
chr1 100 800 chr1 Mot TF 100 300 0.9893 target1 200
答案 0 :(得分:1)
#!/usr/bin/awk -f
BEGIN {
FS = OFS = "\t"
}
NR == FNR {
a0[NR] = $0
a1[NR] = $1
av[NR] = ($4 + $5) / 2
next
}
{
for (i = 1; i in a0; ++i) {
if (a1[i] == $1 && av[i] > $2 && av[i] < $3) {
print $0, a0[i]
}
}
}
使用以下命令运行:
awk -f script.awk fileA fileB
输出:
chr1 500 1000 chr1 Mot TF 500 700 0.9893 target1 600
chr1 400 800 chr1 Mot TF 500 700 0.9893 target1 600
chr1 100 800 chr1 Mot TF 500 700 0.9893 target1 600
chr1 100 800 chr1 Mot TF 100 300 0.9893 target1 200
答案 1 :(得分:0)
如果您对输出格式很灵活:
join fileB fileA | awk '$2 < $NF && $NF < $3'
chr1 500 1000 Mot TF 500 700 0.9893 target1 600
chr1 400 800 Mot TF 500 700 0.9893 target1 600
chr1 100 800 Mot TF 500 700 0.9893 target1 600
chr1 100 800 Mot TF 100 300 0.9893 target1 200
join
不会两次打印连接列。我假设fileA的最后一个字段已经是平均值。
否则
awk -v OFS='\t' '
NR==FNR {f1[$0] = $1; min[$0] = $2; max[$0] = $3; next}
{
avg=($4+$5)/2
for (b in f1) {
if ($1 == f1[b] && min[b] < avg && avg < max[b]) {
print b, $0
}
}
}
' fileB fileA
chr1 100 800 chr1 Mot TF 500 700 0.9893 target1 600
chr1 500 1000 chr1 Mot TF 500 700 0.9893 target1 600
chr1 400 800 chr1 Mot TF 500 700 0.9893 target1 600
chr1 100 800 chr1 Mot TF 100 300 0.9893 target1 200