文件列比较awk(gawk)与refinment

时间:2016-08-03 07:32:19

标签: awk

来自我的两个输入文件: (FILE1.TXT)

a   11-23
b   33-39
c   40-45
d   46-58

&安培; (FILE2.TXT)

33-39
40-42
43-47
51-52

我需要匹配file1第二列的file2值(检查中间范围)并希望输出如下:

b   33-39
c   40-42, 43-45
d   46-47, 51-52

请注意' d 46-47,51-52'出现在最后一行,因为file2中的43-47范围属于c和d。

堆栈溢出用户karakfa,优雅地建议如下:

$ join -j 99 file1 file2 | 
  awk '$2==$3{print $1,$2; next} {split($2,a,"-"); split($3,b,"-")}
   a[1]>=b[1] && a[2]<=b[2] || a[1]<=b[1] && a[2]>=b[2] {print $1,$2",",$3}'

它将输出显示为:

b 33-39
c 40-45, 40-42
d 46-58, 51-52

然而,&#39; c 40-45&#39;的整个范围值。和#46-58&#39;不是我想要的范围,而且我需要它与GNU Awk兼容才能在我的Windows机器上运行。

1 个答案:

答案 0 :(得分:0)

TXR中的解决方案:

@(do
   (defstruct interval nil
     lo hi

     (:postinit (self)
       (unless (< self.lo self.hi)
         (swap self.lo self.hi)))

     (:method print (self stream)
       (put-string `@{self.lo}-@{self.hi}` stream))

     (:method intersects (self other)
       (and (>= self.hi other.lo)
            (<= self.lo other.hi)))

     (:method clip (self other)
       (new interval
            lo (max self.lo other.lo)
            hi (min self.hi other.hi))))

   (defstruct tabentry nil
     label
     interval
     matches

     (:method insert (self other)
       (when self.interval.(intersects other)
         (push self.interval.(clip other) self.matches)))))
@(next "file1.txt")
@(collect :vars (tab))
@label @lo-@hi
@  (bind tab @(new tabentry
                   label label
                   interval (new interval
                                 lo (int-str lo)
                                 hi (int-str hi))))
@(end)
@(next "file2.txt")
@(repeat)
@lo-@hi
@  (do
     (let ((iv (new interval
                    lo (int-str lo)
                    hi (int-str hi))))
       (each ((te tab))
         te.(insert iv))))
@(end)
@(do (each ((te tab))
       (when te.matches
         (put-line `@{te.label} @{(reverse te.matches) ", "}`))))

执行命令

$ txr tab-range.txr 
b 33-39
c 40-42, 43-45
d 46-47, 51-52