正则表达式:避免关键字匹配

时间:2011-12-07 17:22:58

标签: regex

使用正则表达式id: ([a-z]|[A-Z]+)\\w*,我可以识别以字母开头的所有标识符。有没有办法使用单个正则表达式来排除某些特定标识符(例如编程语言中的关键字)?

我有以下输入行的图片:

  

汽车zed var for the airplane

var forwhile是我的编程语言的关键字。正则表达式应仅与carzedairplane匹配。

这可能吗?非常感谢提前!

2 个答案:

答案 0 :(得分:2)

用grep测试:

kent$  echo "car zed var for while airplane"|grep -Po '(?!\bfor|\bwhile|\bvar)\b\w+'
car
zed
airplane

答案 1 :(得分:1)

使用单词锚点和替换:

\b(var|for|while)\b

这只能匹配您编写的完全相同的关键字。

编辑:完全误读了您的问题:

Regex regexObj = new Regex(@"\b(?!(?:for|var|while)\b)\w+\b");
    Match matchResults = regexObj.Match(subjectString);
    while (matchResults.Success) {
        // matched text: matchResults.Value
        // match start: matchResults.Index
        // match length: matchResults.Length
        matchResults = matchResults.NextMatch();
    }

<强>解释

"
\b                # Assert position at a word boundary
(?!               # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
   (?:            # Match the regular expression below
                  # Match either the regular expression below (attempting the next alternative only if this one fails)
         for      # Match the characters “for” literally
      |           # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
         var      # Match the characters “var” literally
      |           # Or match regular expression number 3 below (the entire group fails if this one fails to match)
         while    # Match the characters “while” literally
   )
   \b             # Assert position at a word boundary
)
\w                # Match a single character that is a “word character” (letters, digits, etc.)
   +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\b                # Assert position at a word boundary
"