查找包含使用grep多次出现的单词的行

时间:2014-12-07 02:25:37

标签: bash grep

如何查找包含重复小写字词的所有行。 我希望能够使用egrep执行此操作,这是我到目前为止所尝试的但是我一直收到无效的反向引用:

egrep '\<(.)\>\1' inputFile.txt
egrep -w '\b(\w)\b\1' inputFile.txt

例如,如果我有以下文件:

The sky was grey. 
The fall term went on and on.
I hope every one has a very very happy holiday.
My heart is blue.
I like you too too too much
I love daisies.

它应该在文件中找到以下行:

The fall term went on and on.
I hope every one has a very very happy holiday.
I like you too too too much

它会找到这些行,因为onverytoo这两个词在每行中出现不止一次。

4 个答案:

答案 0 :(得分:1)

知道了,你需要找出重复的单词(全部为低位)

sed -n '/\s\([a-z]*\)\s.*\1/p' infile

工具用于满足您的要求。限制一个工具是不好的方法。

\1是sed中的功能,但不确定grep / egrep是否也具有此功能。

答案 1 :(得分:1)

这可以通过-E-P参数实现。

grep -E '(\b[a-z]+\b).*\b\1\b' file

示例:

$ cat file
The fall term went on and on.
I hope every one has a very very happy holiday.
Hi foo bar.
$ grep -E '(\b[a-z]+\b).*\b\1\b' file
The fall term went on and on.
I hope every one has a very very happy holiday.

答案 2 :(得分:1)

我知道这是关于grep,但这里是awk 它会更灵活,因为您可以轻松更改为计数器c c==2两个相同的词 c>2两个或更多等于单词

awk -F"[ \t.,]" '{c=0;for (i=1;i<=NF;i++) a[$i]++; for (i in a) c=c<a[i]?a[i]:c;delete a} c==2' file
The fall term went on and on.
I hope every one has a very very happy holiday.

它通过一行中的所有单词运行一个循环,并为每个单词创建一个数组索引 然后是一个新的循环,看看是否有重复的单词。

答案 3 :(得分:0)

egrep '[a-z]*' my_file

这将在每行中找到所有小写字符

 egrep '[a-z]*' --color my_file

这将为较低的字符着色