Question

我在文本文件中有一个域名列表，其中包含多次出现在电子邮件文件集合中的域名。例如：

 598 aol.com
  1 aOL.COM
  4 Aol.com
  1 AOl.com
  6 AOL.com
 39 AOL.COM

有598封电子邮件发送到aol.com，1封发送到aOL.COM等等。我想知道在bash中是否有办法将aol.com和aOL.COM以及所有其他别名结合起来，因为它们实际上是同一个东西。任何帮助将不胜感激！

这是产生该输出的代码行：

grep -E -o -r "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" $ARCHIVE | sed 's/.*@//' | sort | uniq -c > temp2

Answer 1

在您的单行中的-i命令中添加--ignore-case（uniq）标记：

grep -E -o -r "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" $ARCHIVE \
    | sed 's/.*@//' \
    | sort \
    | uniq -ic > temp2

来自uniq手册页：

-i
--ignore-case
    Ignore differences in case when comparing lines.

Answer 2

我建议更改生成此代码的程序，先将所有内容设为小写，（Converting string to lower case in Bash shell scripting），然后尝试排序。

事后这样做只会让你的生活更加艰难。