Question

我想有效地在几百个日志文件中搜索~200个文件名。

我可以使用grep的{{1}}指令轻松完成此操作，并将针头放入文件中。

然而，有一些问题：

我有兴趣有效地完成这项工作，如How to use grep efficiently?
我想分别知道所有日志文件中每个搜索词（即文件名）的所有匹配项。 -f匹配，因为它在每个文件中找到针。
我想知道文件名何时无法匹配。

2.7 i7 MBP w / 16gb ram

使用grep -f给我：

grep -ron -f needle *

access_log-2013-01-01:88298:google access_log-2013-01-01:88304:google access_log-2013-01-01:88320:test access_log-2013-01-01:88336:google access_log-2013-01-02:396244:test access_log-2013-01-02:396256:google access_log-2013-01-02:396262:google包含：

needle

这里的问题是从google test搜索整个目录中的任何匹配，并且该进程是单线程的，因此它需要永远。还没有关于它是否找不到匹配的明确信息。

Answer 1

如何在bash脚本中合并grep和find？

for needle in $(cat needles.txt); do
    echo $needle
    matches=$(find . -type f -exec grep -nH -e $needle {} +)
    if [[ 0 == $? ]] ; then
        if [[ -z "$matches" ]] ; then
            echo "No matches found"
        else
            echo "$matches"
        fi
    else
        echo "Search failed / no matches"
    fi
    echo
done

needles.txt包含目标文件名列表。

从中逐行读取针（现在可以包含空格）该文件，请使用此版本：

cat needles.txt | while read needle ; do
    echo $needle
    matches=$(find . -type f -exec grep -nH -e "$needle" {} +)
    if [[ 0 == $? ]] ; then
        if [[ -z "$matches" ]] ; then
            echo "No matches found"
        else
            echo "$matches"
        fi
    else
        echo "Search failed / no matches"
    fi
    echo
done

如果您与xargs合并，则错误代码为$？即使成功，也不再为零。这可能不太安全，但对我有用：

cat needles.txt | while read needle ; do
  echo $needle
  matches=$(find . -type f -print0 | xargs -0 -n1 -P2 grep -nH -e "$needle")
  if [[ -z "$matches" ]] ; then
        echo "No matches found"
  else
        echo "$matches"
  fi
  echo
done

Answer 2

要确定哪些针头不再匹配，您可以从grep和

获取输出

使用awk或类似的东西将匹配的字符串提取到单独的文件中。
将针头文件连接到该文件
sort --uniq filename -o temp1
将针文件连接到temp1
sort temp1 -o temp2
uniq -u temp2 > temp3

temp3将包含不再使用的针。

可能有更简洁的方法来做到这一点。步骤1到3获取文件中找到的唯一针的列表。

说你的针头文件包含：

google
foo
bar

grep在多个文件中找到foo和bar，但找不到谷歌。第1步将创建一个文件：

foo
bar
bar
foo
foo
bar
foo

sort --uniq将创建：

foo
bar

连接针头文件

foo
bar
google
foo
bar

排序给出：

bar
bar
foo
foo
google

最后的uniq -u命令将输出一行：

google

在几百个日志文件中搜索几百个文件名

2 个答案: