Question

我正在尝试使用file2.txt清理一个始终包含相同行的file1.txt，其中file2.txt包含要删除的IP地址列表。我写的工作脚本相信可以通过某种方式得到增强，以更快地执行。

我的脚本：

#!/bin/bash
IFS=$'\n'
for i in $(cat file1.txt); do
        for j in $(cat file2); do
                echo ${i} | grep -v ${j}
        done
done

我已经使用以下数据集测试了脚本：

Amount of lines in file1.txt = 10,000
Amount of lines in file2.txt = 3

Scrit execution time: 
real    0m31.236s
user    0m0.820s
sys     0m6.816s

file1.txt内容：

I3fSgGYBCBKtvxTb9EMz,1.1.2.3,45,This IP belongs to office space,1539760501,https://myoffice.com
I3fSgGYBCBKtvxTb9EMz,1.2.2.3,45,This IP belongs to office space,1539760502,https://myoffice.com
I3fSgGYBCBKtvxTb9EMz,1.3.2.3,45,This IP belongs to office space,1539760503,https://myoffice.com
I3fSgGYBCBKtvxTb9EMz,1.4.2.3,45,This IP belongs to office space,1539760504,https://myoffice.com
I3fSgGYBCBKtvxTb9EMz,1.5.2.3,45,This IP belongs to office space,1539760505,https://myoffice.com
... lots of other lines in the same format
I3fSgGYBCBKtvxTb9EMz,4.1.2.3,45,This IP belongs to office space,1539760501,https://myoffice.com

file2.txt内容：

1.1.2.3
1.2.2.3
... lots of other IPs here
1.2.3.9

我该如何改善这些时间？我相信文件会随着时间增长。就我而言，我将每小时从cron运行一次脚本，因此我想在这里进行改进。

Answer 1

您要删除file1.txt中包含与file2.txt匹配的子字符串的所有行。 grep进行救援

grep -vFwf file2.txt file1.txt

-w必须避免11.11.11.11与111.11.11.111匹配

-F, --fixed-strings, --fixed-regexp 将PATTERN解释为固定字符串列表，由换行符分隔，其中任何一个都将被匹配。（{-F由POSIX指定，--fixed-regexp是已过时的别名，请不要在新脚本中使用它。）

-f FILE, --file=FILE 从FILE获取模式，每行一个。空文件包含零个模式，因此不匹配。（{-f由POSIX指定。）

-w, --word-regexp 仅选择那些包含组成整个单词的匹配项的行。测试是匹配的子字符串必须在行的开头，或者必须在非单词组成字符之前。同样，它必须在行的末尾，或后跟非单词组成字符。单词构成的字符是字母，数字和下划线。

_{来源：man grep}

还有一点，这是您的脚本的几个指针：

请勿使用for循环读取文件（http://mywiki.wooledge.org/DontReadLinesWithFor）。
请勿使用cat（请参阅How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?）
使用引号！（请参见Bash and Quotes）

这使我们可以将其重写为：

#!/bin/bash
while IFS=$'\n' read -r i; do
  while IFS=$'\n' read -r j; do
      echo "$i" | grep -v "$j"
  done < file2
done < file1

现在问题是您读了file2次N次。其中N是file1的行数。这不是很有效。幸运的是grep为我们提供了解决方案（请参见顶部）。

删除bash中另一个文件中存在的内容

1 个答案: