Question

标题说明了一切，我需要使用具有规格的egrep过滤文件，但我无法弄清楚的是确保它出现3次。（来自问题的直接措辞 - 包含5个或更多字符的单词，在行中至少出现三次）

Answer 1

egrep '([a-zA-Z]{5}).*\1.*\1'

这适用于我的快速测试，但我不确定它有多强大

\1（和\2，\3 ...）是反向引用。我在模式周围放置了(和)五个字母[a-zA-Z]，这被称为第一个捕获组。 \1则表示正则表达式希望找到在第一个( - )组内匹配的相同单词的重复。

最后，三个单词之间有一个.*，以便在它们之间出现任何内容

Answer 2

使用awk（未经测试）：

awk '
  /\b[a-zA-Z]{5}\b/{
    matches[$0]++
  }
  END{
    for (m in matches) {
      if (matches[m] >= 3) {print m}
    }
  }
' file

Answer 3

$ cat ip.txt 
abc abc abc should not match
totally this line should totally match, isn't it? totally 
Title: word with 5 letters like title should also match, given title is present 3 or more times
this line should not totally match, total only partly matches with totally

匹配具有匹配大小写的单词：

$ grep -wE '([a-zA-Z]{5,}).*\1.*\1' ip.txt 
totally this line should totally match, isn't it? totally

无论大小写如何匹配单词：

$ grep -iwE '([a-zA-Z]{5,}).*\1.*\1' ip.txt 
totally this line should totally match, isn't it? totally 
Title: word with 5 letters like title should also match, given title is present 3 or more times

匹配五个或更多字母的任何序列：

$ grep -iE '([a-zA-Z]{5,}).*\1.*\1' ip.txt 
totally this line should totally match, isn't it? totally 
Title: word with 5 letters like title should also match, given title is present 3 or more times
this line should not totally match, total only partly matches with totally

-E扩展正则表达式
-w仅匹配整个单词
-i忽略大小写
[a-zA-Z]{5,}小写或大写字母，五次或更多次
()捕获组，\1是对它的反向引用

如果你有pcre正则表达式

，那就有点乐趣了

$ echo 'totally title match' | grep -P '([a-zA-Z]{5,}).*(?1).*(?1)'
totally title match

(?1)指的是正则表达式模式([a-zA-Z]{5,})本身

UNIX - 使用egrep，如何过滤n次出现的模式？

3 个答案: