来自我的两个输入文件: (FILE1.TXT)
a 11-23
b 33-39
c 40-45
d 46-58
&安培; (FILE2.TXT)
33-39
40-42
43-47
51-52
我需要匹配file1第二列的file2值(检查中间范围)并希望输出如下:
b 33-39
c 40-42, 43-45
d 46-47, 51-52
请注意' d 46-47,51-52'出现在最后一行,因为file2中的43-47范围属于c和d。
堆栈溢出用户karakfa,优雅地建议如下:
$ join -j 99 file1 file2 |
awk '$2==$3{print $1,$2; next} {split($2,a,"-"); split($3,b,"-")}
a[1]>=b[1] && a[2]<=b[2] || a[1]<=b[1] && a[2]>=b[2] {print $1,$2",",$3}'
它将输出显示为:
b 33-39
c 40-45, 40-42
d 46-58, 51-52
然而,&#39; c 40-45&#39;的整个范围值。和#46-58&#39;不是我想要的范围,而且我需要它与GNU Awk兼容才能在我的Windows机器上运行。
答案 0 :(得分:0)
TXR中的解决方案:
@(do
(defstruct interval nil
lo hi
(:postinit (self)
(unless (< self.lo self.hi)
(swap self.lo self.hi)))
(:method print (self stream)
(put-string `@{self.lo}-@{self.hi}` stream))
(:method intersects (self other)
(and (>= self.hi other.lo)
(<= self.lo other.hi)))
(:method clip (self other)
(new interval
lo (max self.lo other.lo)
hi (min self.hi other.hi))))
(defstruct tabentry nil
label
interval
matches
(:method insert (self other)
(when self.interval.(intersects other)
(push self.interval.(clip other) self.matches)))))
@(next "file1.txt")
@(collect :vars (tab))
@label @lo-@hi
@ (bind tab @(new tabentry
label label
interval (new interval
lo (int-str lo)
hi (int-str hi))))
@(end)
@(next "file2.txt")
@(repeat)
@lo-@hi
@ (do
(let ((iv (new interval
lo (int-str lo)
hi (int-str hi))))
(each ((te tab))
te.(insert iv))))
@(end)
@(do (each ((te tab))
(when te.matches
(put-line `@{te.label} @{(reverse te.matches) ", "}`))))
执行命令
$ txr tab-range.txr
b 33-39
c 40-42, 43-45
d 46-47, 51-52