正则表达式选择所有BUT组

时间:2015-08-04 11:26:33

标签: regex

所以,我必须只使用正则表达式来选择除特定单词之外的所有内容。出于举例的目的,该单词将为foobar。这是应该发生的事情的一个例子:

this should be highlighted, and
same with this. but any sentence
that has the word
foobar
shouldnt be, and same for any regular
sentence with foobar <-- like that
foobar beginning a sentence should invalidate
the entire sentence, same with at the end foobar
only foobar, and nothing else of the sentence
more words here more irrelevant stuff to highlight
and nothing of the key word
what about multiple foobar on the same foobar line?

应该匹配什么,看起来像这样:

match_highlighted.png

我能得到的最好的是/\b(?!foobar)[^\n]+\n?/g如果foobar这个词单独出现在它自己的单独行上,那就是这样的:

not foobar
foobar (ignored)
totallynotfoobar
nobar
foobutts
foobar (ignored)
notagain

其余的匹配......但这不是我想要的。

所以我的问题是,我将如何完成原始示例?它甚至可能吗?

1 个答案:

答案 0 :(得分:1)

这是一种方式:(demo

\W*\b(?!foobar).+?\b\W*

?中的.+?是为了确保我们在获得\b后立即停止匹配,否则我们可能会跳过某些foobar

\W*是使用字符串中任何前导或尾随非单词字符所必需的。

这里单独匹配每个单词和每个单词分隔符,这可能不太理想。

Full explanation

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  \W*                      non-word characters (all but a-z, A-Z, 0-
                           9, _) (0 or more times (matching the most
                           amount possible))
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    foobar                   'foobar'
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  .+?                      any character except \n (1 or more times
                           (matching the least amount possible))
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  \W*                      non-word characters (all but a-z, A-Z, 0-
                           9, _) (0 or more times (matching the most
                           amount possible))

具有后视和前瞻的变体:(使用/gs/gm)(demo

(?<=^|\bfoobar\b)(?!foobar\b)(.*?)(?=\bfoobar\b|$)

我相信所有这些\b都必须正确处理foobar作为单词的一部分出现的所有情况(如果它也应该被排除在外,只需删除所有\b {1}}应该有效。)