Question

我需要使用egrep来计算包含与正则表达式匹配的字符串的单词。例如，我需要做一些事情，比如“计算包含三个连续元音的单词的数量”（不完全是这样，但这就是它的要点）。

我已经想出如何计算包含这些单词的行，但是当我添加-w标记时，我得到egrep: illegal option -- w错误。

这是我用来计算上面场景中的线条的正则表达式，这似乎有效：

egrep -i -c '[aeiou][aeiou][aeiou]' full.html

将-w标记与此命令一起使用会导致上面列出的错误，即使我在正则表达式表达式周围添加了\ b标记。 e.g：

egrep -i -c -w '\b.*[aeiou][aeiou][aeiou].*\b' full.html

我做错了什么？

编辑：我在终端上的Solaris 10上运行它。

Answer 1

使用这种方式也可以找到包含字符串的单词的计数

grep --color -Eow '[aeiou][aeiou][aeiou]' filename | wc -l

或

egrep -ow '[aeiou][aeiou][aeiou]' filename | wc -l

o仅打印匹配的。

w for word。

最后，它会显示单词的计数。

Answer 2

您必须咨询您的solaris手册页，以了解您的egrep是否支持任何/所有/部分GNU类扩展。

您的系统是否有/ usr / xpg4 / bin？如果是，请确保您的MANPATH包含/ usr / xpg4 / man。这个目录曾经有过最新的版本，没有添加像/ opt / gnu install这样的东西。

无论如何，你的正则表达式'\b.*[aeiou][aeiou][aeiou].*\b'在我看来是......

1 word-boundary
followed by any number of any chars (including blanks and vowels) 
followed by three vowels, 
followed by any number of any chars (including blanks and vowels), 
followed by 1 word-boundary.

可能不是你真正想要的。

为了满足您对连续3个元音的单词的需求并使用旧/方格注册长手，请尝试

 egrep -i -c '[a-z]*[aeiou][aeiou][aeiou][a-z]*' full.html

这就是说，匹配字符[a-z]任意数字（包括无），在3个元音之前，后跟任意数量的字符[a-z]（包括无）。因此空间字符与[a-z]不匹配。你使用-i忽略大小写，所以你不必使用[A-Za-z]。显然，如果您发现其他字符要考虑为单词字符，可能是'_'字符？，请将其添加到双方。

很抱歉，但我要从内存开始，我不在Solaris商店工作，也无法在那里测试。

修改

另请注意，我当前的grep系统上的手册页说

-c, --count Suppress normal output; instead print a count of matching lines for each input file. With the -v, --invert-match option (see below), count non-matching lines.

请注意，这是匹配行数，而不是匹配数。

可能更容易使用

awk '{for (i=1;i<=NF;i++){if ($i ~ /.*[aeiou][aeiou][aeiou].*/) cnt++};}; END{print "count="cnt}'file

IHTH

Answer 3

我认为egrep不支持\b字边界。尝试\<开始字边界，\>开始字边界。

修改
嗯......没关系。根据{{3}} \b支持。

实际上，我认为答案是只有grep支持“-w”选项。我不认为egrep会这样做。 man page

Answer 4

哪个平台和哪个版本的egrep？

-w选项适用于我（使用GNU egrep的CentOS和Mac） - 见下文。此外，\b按预期工作 - 请参阅下文。

另外，我使用了不同的正则表达式 - 见下文。

$ grep --version
grep (GNU grep) 2.5.1

$ cat test.txt 
this and that iou and eai
not this aaih
not this haai

$ egrep -i -w '[aeiou]{3}' test.txt 
this and that iou and eai

# with no -w
egrep -i '\b[aeiou]{3}\b' test.txt
this and that iou and eai

# with neither -w nor {3}
$ egrep -i '\b[aeiou][aeiou][aeiou]\b' /tmp/test.txt 
this and that iou and eai

# using '\<' and '\>' works as well for word boundaries
$ egrep -i '\<[aeiou][aeiou][aeiou]\>' /tmp/test.txt 
this and that iou and eai

如何使用egrep列出与正则表达式匹配的单词？

4 个答案: