Question

此问题是之前question的方向扩展。我的搜索要求如下：

需要搜索的多个字符串存储在文件values.txt（输入文件）中，例如包含以下信息

string1  1
string2  3
string3  5

其中第一列（string1，string2，string3）表示其中的字符串需要搜索，而第二列表示数量要搜索的事件。
此外，搜索需要在带有文件的文件上递归执行特定文件扩展名（例如.out，.txt等）
搜索输出应指向一个文件，其中打印搜索的输出以及文件名及其路径。

例如，典型输出必须类似于下面给出的输出（对于扩展名为.out的fileNames的递归搜索）

<path_of_searched_file1/fileName1.out>
The full line containing the <first> instance of <string1>
The full line containing the <first> instance of <string2>
The full line containing the <second> instance of <string2>
The full line containing the <third> instance of <string2>
The full line containing the <first> instance of <string3>
The full line containing the <second> instance of <string3>
The full line containing the <third> instance of <string3>
The full line containing the <fourth> instance of <string3>
The full line containing the <fifth> instance of <string3>


<path_of_searched_file2/fileName2.out>
The full line containing the <first> instance of <string1>
The full line containing the <first> instance of <string2>
The full line containing the <second> instance of <string2>
The full line containing the <third> instance of <string2>
The full line containing the <first> instance of <string3>
The full line containing the <second> instance of <string3>
The full line containing the <third> instance of <string3>
The full line containing the <fourth> instance of <string3>
The full line containing the <fifth> instance of <string3>


and so on

使用awk是解决此搜索问题的最佳方法吗？如果是这样，有人可以帮我修改之前question中提供的awk代码，以满足我当前的搜索要求。

Answer 1

这是使用awk的一种方式;因人而异。像：

一样运行

awk -f ./script.awk values.file $(find . -type f -regex ".*\.\(txt\|doc\|etc\)$")

script.awk的内容：

FNR==NR {
    a[$1]=$2;
    next
}

FNR==1 {
    for (i in a) {
        b[i]=a[i]
    }
}

{
    for (j in b) {
        if ($0 ~ j && b[j]-- > 0) {
            print > FILENAME ".out"
        }
    }
}

或者，这是单行：

awk 'FNR==NR { a[$1]=$2; next } FNR==1 { for (i in a) b[i]=a[i] } { for (j in b) if ($0 ~ j && b[j]-- > 0) print > FILENAME ".out" }' values.file $(find . -type f -regex ".*\.\(txt\|doc\)$")

说明：

在第一个块中，创建一个关联数组，其中第一列为values.file作为键，第二列为值。第二个和第三个块读入使用find命令找到的文件。在第一个块中形成的数组是重复的（使用awk没有简单的方法来执行此操作;因此，对于找到的每个文件，可能Perl和Find::File::Rule模块是更好的选择吗？）。在第三个块中，我们循环遍历每个键，搜索字符串并递减它的值，使用“.out”扩展名打印到文件的位置。

递归搜索多个字符串的多次出现

1 个答案: