linux命令行中2个文本文件之间的公共行

时间:2017-12-14 10:14:42

标签: awk

我有2个文本文件。第一个是这样的:

DB  41533499    41533500    14
CD  41533500    41533501    3
AR  41533504    41533505    5
DR  41533506    41533507    3
AR  41533508    41533509    1
AR  48743349    48743350    1

,第二个看起来像这样:

DB  41533400    41533600
DR  41533300    41533800
AR  41533200    41533800
AR  48743100    48743983

第2列和第3列之间的差异是1,这意味着它是一个点。我想创建一个新文件,其中第1列在2个文件之间是通用的,文件2中第2列和第3列的范围在file2中的第2列和第3列的范围内。这是预期的输出:

DB  41533400    41533600    41533499    41533500    14
DR  41533300    41533800    41533506    41533507    3
AR  41533200    41533800    41533508    41533509    1
AR  48743100    48743983    48743349    48743350    1

我正在尝试在linux命令行中编写以下内容但是没有得到我想要的内容:

awk '{print $1 "\t" $2 "\t" $3 "\t" }' file2.txt '{print $1 "\t" $2 "\t" $3 "\t" $4 }' file1.txt > output.txt

你知道怎么解决吗?

2 个答案:

答案 0 :(得分:1)

这是GNU awk的一个,但我分享the same question with @RomanPerekhrest关于记录AR 41533504 41533505 5

$ awk 'NR==FNR{
    a[$1][$2]=$3; next
}
($1 in a) {
    for(i in a[$1])
        if($2>=i && $3 <= a[$1][i])
            print $1,i,a[$1][i],$2,$3,$4
}' file2 file1
DB 41533400 41533600 41533499 41533500 14
AR 41533200 41533800 41533504 41533505 5
DR 41533300 41533800 41533506 41533507 3
AR 41533200 41533800 41533508 41533509 1
AR 48743100 48743983 48743349 48743350 1

答案 1 :(得分:0)

基于我对基于缺失行的要求的免费解释

使用管道而不是单个awk脚本(已经回答)

$ join <(sort file2) <(sort file1) | # sort and join on key (1st field)
  awk '$2<$4 && $3>$5'             | # apply within range logic
  sort -k6n                        | # sort ascending based on last field
  awk '!a[$2]++'                   | # pick first instance of 2nd field (the lowest) 
  tac                                # reverse to be in descending order


DB 41533400 41533600 41533499 41533500 14
DR 41533300 41533800 41533506 41533507 3
AR 48743100 48743983 48743349 48743350 1
AR 41533200 41533800 41533508 41533509 1