Question

我想匹配以下句子：

<b>(ABC)</b>
<b> (ABC) </b>
<b> abc (ABC) fgt </b>

模式如下：

"(<b>.*?\()([A-Z]+)(\).*?</b>)"

这适用于上面的示例，但如果句子如下：

<b></b>(ABCA)<b>(ABCB)</b>

然后我错了比赛。正则表达式首次出现并匹配到第一个(。接下来，它会跳过所有单词，直到。那是错的。正确匹配必须是(ABCB)。如何解决？

Answer 1

如果你想保持正则表达式不越过标记边界，.*“匹配任何”令牌太松散了，因为“任何东西”也涵盖了标记本身。

您可以使用negative lookahead assertion来确保和不能成为匹配项的一部分：

(<b>(?:(?!</?b>).)*\()([A-Z]+)(\)(?:(?!</?b>).)*</b>)

测试live on regex101.com。

<强>解释

(         # Match into group 1:
 <b>      # <b>
 (?:      # Start of non-capturing group
  (?!     # Match only if it's impossible to match
    </?b> # <b> or </b>
  )       # (End of lookahead assertion)
  .       # Match any character
 )*       # Repeat as many times as possible
 \(       # Then match a (
)         # End of group 1
([A-Z]+)  # Match one or more uppercase ASCII letters --> group 2
(         # Match into group 3:
 \)       # Match )
 (?:(?!</?b>).)* # as before, match anything except <b> or </b>
 </b>     # Match </b>
)         # End of group 3

Answer 2

将.*?替换为正则表达式中的[^<>]*，以便它匹配任何字符，但不匹配<或>零次或多次。这可确保在开始和结束标记之间不存在任何标记。

(<b>[^<>]*?\()([A-Z]+)(\)[^<>]*?</b>)

DEMO

RegEx：匹配到特定单词

2 个答案: