Question

我曾经有一个像下面这样的脚本

for i in $(cat list.txt)
do
  grep $i sales.txt
done

cat list.txt

tomatoes
peppers
onions

还有cat sales.txt

Price Products
$8.88 bread
$6.75 tomatoes
$3.34 fish
$5.57 peppers
$0.95 beans
$4.56 onions

我是BASH / SHELL的初学者，在阅读了Why is using a shell loop to process text considered bad practice?之类的帖子后，我将以前的脚本更改为以下内容：

grep -f list.txt sales.txt

这是最后一种比使用for循环更好的方法吗？起初我以为是，但是后来我意识到这可能是相同的，因为grep每次在目标文件中抓住另一行时都必须读取查询文件。有谁知道它是否更好，为什么？如果更好，我可能会漏掉一些关于grep如何处理此任务的信息，但我无法弄清楚。

Answer 1

扩大我的评论...

您可以使用以下命令通过git下载grep的源代码：

 git clone https://git.savannah.gnu.org/git/grep.git

您可以在src / grep.c的第96行看到注释：

/* A list of lineno,filename pairs corresponding to -f FILENAME
   arguments. Since we store the concatenation of all patterns in
   a single array, KEYS, be they from the command line via "-e PAT"
   or read from one or more -f-specified FILENAMES.  Given this
   invocation, grep -f <(seq 5) -f <(seq 2) -f <(seq 3) FILE, there
   will be three entries in LF_PAIR: {1, x} {6, y} {8, z}, where
   x, y and z are just place-holders for shell-generated names.  */

关于我们需要了解的所有线索，无论是与文件一起通过-e还是通过-f进入的搜索模式都被转储到数组中。然后，该数组就是搜索的来源。在C中遍历该数组的速度将比Shell在文件中循环运行的速度更快。因此，仅此一项就能赢得速度竞赛。

此外，正如我在评论中提到的那样，grep -f list.txt sales.txt更易于阅读，易于维护，并且仅需调用一个程序（grep）。

Answer 2

您的第二个版本更好，因为：

只需要对文件进行一次传递（不需要像您想的那样进行多次传递）
它没有球形错误和空格错误（您的第一次尝试在green beans或/*/*/*/*上表现不佳）

当1.您正确地完成操作并且2.开销可以忽略不计时，完全用shell代码读取文件是完全可以的，但这两种方法都不适用于您的第一个示例（除了当前文件很小的事实以外）。 / p>

在for循环中使用文件查询和用文件查询查询文件之间的更快/更好的做法是什么？

2 个答案: