Question

我有这个正则表达式：

(?<!Sub ).*\(.*\)

我希望它符合这个：

MsgBox（“修复AREA和TD字段的总运行时间为：”＆amp; = imeElapsed＆amp;“分钟。”）

但不是这样：

Sub ChangeAreaTD（）

但不知怎的，我仍然匹配以Sub开头的那个...有没有人知道为什么？我以为我会通过

排除“Sub”

(?<!Sub )

感谢任何帮助！

感谢。

Answer 1

这样做：

^MsgBox .*\(.*\)

问题是负面的后视不能保证字符串的开头。它将匹配任何地方。

但是，在正则表达式的开头添加^字符可以保证字符串的开头。然后，将Sub更改为MsgBox，使其仅匹配以MsgBox开头的字符串

Answer 2

您的正则表达式(?<!Sub ).*\(.*\)，拆开：

(?<!         # negative look-behind
  Sub        #   the string "Sub " must not occur before the current position
)            # end negative look-behind
.*           # anything       ~ matches up to the end of the string!
\(           # a literal "("  ~ causes the regex to backtrack to the last "("
  .*         # anything       ~ matches up to the end of the string again!
\)           # a literal ")"  ~ causes the regex to backtrack to the last ")"

所以，用你的测试字符串：

Sub ChangeAreaTD()

立即实现立即（位于0位置）。

之后，.*会移到字符串的末尾。

由于这个.*，后视从来没有真正有所作为。

你可能在考虑

(?<!Sub .*)\(.*\)

但你的正则表达式引擎不太可能支持可变长度的后视。

所以我会这样做（因为可变长度前瞻得到广泛支持）：

^(?!.*\bSub\b)[^(]+\(([^)]+)\)

翻译为：

^ # At the start of the string, (?! # do a negative look-ahead: .* # anything \b # a word boundary Sub # the string "Sub" \b # another word bounday ) # end negative look-ahead. If not found, [^(]+ # match anything except an opening paren ~ to prevent backtracking \( # match a literal "(" ( # match group 1 [^)]+ # match anything up to a closing paren ~ to prevent backtracking ) # end match group 1 \) # match a literal ")".

然后转到匹配组1的内容。

然而，正则表达式通常非常不适合解析代码。对于HTML来说，这与VB代码一样。即使使用改进的正则表达式，您也会得到错误的匹配。例如，这里是因为嵌套的parens：

MsgBox ("The total run time to fix all fields (AREA, TD) is: ...")

Answer 3

这里有一个回溯问题。 .*中的第一个(?<!Sub ).*\(.*\)可以匹配ChangeAreaTD或hangeAreaTD。在后一种情况下，前4个字符为ub C，与Sub不匹配。由于外观被否定，这算作一场比赛！

在你的正则表达式的开头添加^对你没有帮助，因为look-behind是一个零长度的匹配短语。 ^(?<!MsgBox )会查找以MsgBox结尾的行后面的行。你需要做的是^(?!Sub )(.*\(.*\))。这可以解释为“从字符串的开头开始，确保它不以Sub开头。然后，如果它看起来像方法调用，则捕获字符串中的所有内容。”

可以找到正则表达式引擎如何解析外观的一个很好的解释here。

Answer 4

如果您只想匹配函数调用而不是声明，那么前括号匹配不应与任何字符匹配，但更可能是任何标识符后跟空格。因此

(?<!Sub )[a-zA-Z][a-zA-Z0-9_]* *\(.*\)

标识符可能需要更多令牌，具体取决于您匹配的语言。

为什么这个正则表达式匹配？

4 个答案: