Question

我有一个目录，其中不断编写数百个文件。我想为这些文件grep这些文件，然后在这些文件中grep for pattern2并将包含该pattern2的行写入单独的文件。我正在使用grep来做到这一点。

grep pattern /dir/* | awk {'$1 = ""; print $0'} | grep pattern2 > $mydir/myDATA.txt

但问题在于正在编写的文件。我正在做上面的grep作为shell脚本的一部分，它将作为一个cron运行，可能每5分钟收集一次数据。现在，我如何让我的脚本跳过它已经检查过的文件？我正在从myDATA.txt文件中执行的另一个片段是剪切它以获得我想要的输出以某种格式。

 array=$(cat /dir/myDATA.txt | tr "," "\n")
 for x in $array
 do
 bunch of stuff.

我几乎把这个部分钉了下来。唯一的问题是正在编写的文件。所以我希望我的脚本能够查看目录中的文件，跳过已经看过的文件，然后运行grep命令输出到文件，然后将该文件修改为清理后的个性化输出。

Answer 1

你可以

1) create a list of the existing files in the directory filtered by a timestamp or a list of previously checked files
2) check through the files in a loop one by one
3) as you check each file either add it's name to a "done" list in another file or perhaps "touch" the files to update their timestamp if that is an acceptable option.
4) maintain the timestamp in a file of the last time you ran the cron job or subtract 5 minutes from the system time
5) Repeat

如果这没有意义，请告诉我。

此外，您应该能够将第一个grep的结果直接传递给第二个grep而不在中间使用AWK。

Answer 2

我建议使用像inotifywait这样的工具为新文件创建事件。您可以连续过滤和读取其输出，然后对每个新文件进行进一步处理。这样，您就不需要实现复杂的机制来跟踪已访问过的文件，并且可以在文件写入后立即对其进行处理。

grep通过多个文件

2 个答案: