Question

来自我的两个输入文件：（FILE1.TXT）

a   11-23
b   33-39
c   40-45
d   46-58

＆安培; （FILE2.TXT）

我需要匹配file1第二列的file2值（检查中间范围）并希望输出如下：

b   33-39
c   40-42, 43-45
d   46-47, 51-52

请注意＆＃39; d 46-47,51-52＆＃39;出现在最后一行，因为file2中的43-47范围属于c和d。

堆栈溢出用户karakfa，优雅地建议如下：

$ join -j 99 file1 file2 | 
  awk '$2==$3{print $1,$2; next} {split($2,a,"-"); split($3,b,"-")}
   a[1]>=b[1] && a[2]<=b[2] || a[1]<=b[1] && a[2]>=b[2] {print $1,$2",",$3}'

它将输出显示为：

b 33-39
c 40-45, 40-42
d 46-58, 51-52

然而，＆＃39; c 40-45＆＃39;的整个范围值。和＃46-58＆＃39;不是我想要的范围，而且我需要它与GNU Awk兼容才能在我的Windows机器上运行。

Answer 1

TXR中的解决方案：

@(do
   (defstruct interval nil
     lo hi

     (:postinit (self)
       (unless (< self.lo self.hi)
         (swap self.lo self.hi)))

     (:method print (self stream)
       (put-string `@{self.lo}-@{self.hi}` stream))

     (:method intersects (self other)
       (and (>= self.hi other.lo)
            (<= self.lo other.hi)))

     (:method clip (self other)
       (new interval
            lo (max self.lo other.lo)
            hi (min self.hi other.hi))))

   (defstruct tabentry nil
     label
     interval
     matches

     (:method insert (self other)
       (when self.interval.(intersects other)
         (push self.interval.(clip other) self.matches)))))
@(next "file1.txt")
@(collect :vars (tab))
@label @lo-@hi
@  (bind tab @(new tabentry
                   label label
                   interval (new interval
                                 lo (int-str lo)
                                 hi (int-str hi))))
@(end)
@(next "file2.txt")
@(repeat)
@lo-@hi
@  (do
     (let ((iv (new interval
                    lo (int-str lo)
                    hi (int-str hi))))
       (each ((te tab))
         te.(insert iv))))
@(end)
@(do (each ((te tab))
       (when te.matches
         (put-line `@{te.label} @{(reverse te.matches) ", "}`))))

执行命令

$ txr tab-range.txr 
b 33-39
c 40-42, 43-45
d 46-47, 51-52

文件列比较awk（gawk）与refinment

1 个答案: