Question

我想使用awk从文件中打印包含2个以上元音的所有单词。

到目前为止，这是我的代码：

#!/bin/bash
cat $1 | awk '{   #Default file separator is space 
for (i=1;i<=NF;i++)  #for every word          
  {
  if ($i ~ /([aeiojy]){2,}/)            
    {
      print $i
    }
}}'

正则表达式是问题

/（[aeiojy]）{2，} /）这是我的实际想法，但它不起作用。

Answer 1

这应该适用于GNU grep：

grep -Poi '([^[:space:]]*?[aeiou]){3,}[^[:space:]]*' file

选项：

-P perl compatible regular expressions
-o output every match on a single line
-i case insensitive match

正则表达式：

(                start of subpattern
  [^[:space:]]*  zero or more arbitrary non whitespace characters
  ?              ungreedy quantifier for the previous expression (perl specific)
  [aeiou]        vowel
)                end of subpattern
{3,}             the previous expression appears 3 or more times
[^[:space:]]*    zero or more other characters until word boundary.

顺便说一句，这里实际上不需要perl兼容的正则表达式。使用普通grep，您可以使用：

grep -oi '\([^[:space:]aeiou]*[aeiou]\)\{3,\}[^[:space:]]*' file

注意：我已在上述示例中排除了标点符号，但可以根据需要添加。

Answer 2

您可以在SELECT m1.id, COUNT(*) FROM mytable m1 JOIN mytable m2 ON FIND_IN_SET(m1.id, m2.ancestors) GROUP BY m1.id中使用split功能：

awk

awk -v RS=' ' 'split($0, a, /[aeiouAEIOU]/) > 2' file会将按空格分隔的每个单词作为单独的记录处理。
-v RS=' '将返回大于2的值。

Linux Ubuntu Bash - 使用AWK正则表达式查找包含2个以上元音的单词

2 个答案: