标题说明了一切,我需要使用具有规格的egrep过滤文件,但我无法弄清楚的是确保它出现3次。 (来自问题的直接措辞 - 包含5个或更多字符的单词,在行中至少出现三次)
答案 0 :(得分:1)
egrep '([a-zA-Z]{5}).*\1.*\1'
这适用于我的快速测试,但我不确定它有多强大
\1
(和\2
,\3
...)是反向引用。我在模式周围放置了(
和)
五个字母[a-zA-Z]
,这被称为第一个捕获组。 \1
则表示正则表达式希望找到在第一个(
- )
组内匹配的相同单词的重复。
最后,三个单词之间有一个.*
,以便在它们之间出现任何内容
答案 1 :(得分:0)
使用awk(未经测试):
awk '
/\b[a-zA-Z]{5}\b/{
matches[$0]++
}
END{
for (m in matches) {
if (matches[m] >= 3) {print m}
}
}
' file
答案 2 :(得分:0)
$ cat ip.txt
abc abc abc should not match
totally this line should totally match, isn't it? totally
Title: word with 5 letters like title should also match, given title is present 3 or more times
this line should not totally match, total only partly matches with totally
匹配具有匹配大小写的单词:
$ grep -wE '([a-zA-Z]{5,}).*\1.*\1' ip.txt
totally this line should totally match, isn't it? totally
无论大小写如何匹配单词:
$ grep -iwE '([a-zA-Z]{5,}).*\1.*\1' ip.txt
totally this line should totally match, isn't it? totally
Title: word with 5 letters like title should also match, given title is present 3 or more times
匹配五个或更多字母的任何序列:
$ grep -iE '([a-zA-Z]{5,}).*\1.*\1' ip.txt
totally this line should totally match, isn't it? totally
Title: word with 5 letters like title should also match, given title is present 3 or more times
this line should not totally match, total only partly matches with totally
-E
扩展正则表达式-w
仅匹配整个单词-i
忽略大小写[a-zA-Z]{5,}
小写或大写字母,五次或更多次()
捕获组,\1
是对它的反向引用pcre
正则表达式,那就有点乐趣了
$ echo 'totally title match' | grep -P '([a-zA-Z]{5,}).*(?1).*(?1)'
totally title match
(?1)
指的是正则表达式模式([a-zA-Z]{5,})
本身