Question

我有一个包含列中几个数字的文件：（numbers.txt）

还有另一个文本文件，它是这样的：（input.txt）

#    126    2    0
bla    mjnb     kjh    ojj
#    5    65    0
kjh    jhgg    kjhkjh    juh
hgj    ikaw    esd     cdqw
#    100    3005    0
jhgjh    jh    jhjhg    pol

第一行很重要。在＃之后编写的两个数字都应该在number.txt中，我已经编写了以下代码，但是我的巨大文件需要几个星期。我的numbers.txt包含大约2500个数字。

    #!/bin/bash
    cat numbers.txt | while read first
    do
    for second in $(cat numbers.txt)
    do
    awk -v RS="#" "/ $first    $second / {sub(/^ /,RS);print}"input.txt >> output1.txt
    done
    done

输出应该是：

#    126    2    0
bla    mjnb     kjh    ojj
#    5    65    0
kjh    jhgg    kjhkjh    juh
hgj    ikaw    esd     cdqw

有人可以提供更快的方式来达到输出吗？

Answer 1

使用awk可以不创建嵌套循环：

awk 'FNR==NR{a[$0];next} $1=="#" && ($2 in a) && ($3 in a) {p=1}
           $1=="#" && (!($2 in a) || !($3 in a)) {p=0} p' file1 file2
#    126    2    0
bla    mjnb     kjh    ojj
#    5    65    0
kjh    jhgg    kjhkjh    juh
hgj    ikaw    esd     cdqw

Answer 2

awk '
    # read the numbers file into the array "num"
    NR == FNR {num[$1]; next} 

    # if this is a "#" line and the first 2 numbers are in "num" set a flag to "true"
    $1 == "#" {p = (($2 in num) && ($3 in num))} 

    # print the current line if the flag is true
    p
' numbers.txt input.txt

检查文本文件中的两个变量

2 个答案: