正面观察和重复模式的问题

时间:2011-06-04 05:09:45

标签: .net regex

考虑以下字符串:

ab(cd.xz) e(ab(fg).xz)) ab(hi.xz)

我想匹配在ab之后开始的每个子字符串(以z结尾。所以我写了下面的正则表达式:

(?<=a.*?\().*?z

根据RegexBuddy,这应该尝试执行以下操作:

Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=a.*?\()»
   Match the character “a” literally «a»
   Match any single character that is not a line break character «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
   Match the character “(” literally «\(»
Match any single character that is not a line break character «.*?»
   Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “z” literally «z»

我在RegexBuddy中得到的结果是以下匹配(注意中间的一个不正常,因为它应该匹配fg).xz)。我做错了什么?

http://img101.imageshack.us/img101/7753/regex.jpg

2 个答案:

答案 0 :(得分:4)

正则表达式按设计工作:)

在第二个示例中,lookbehind表达式与ab(cd.xz) e(匹配。始终从字符串开头开始尝试向后看(必要时向前移动),因此.*?匹配的次数比您想象的要多。它不是(正如人们所预期的那样)实际从当前位置向后执行。

所以在第三个例子中,lookbehind甚至匹配ab(cd.xz) e(ab(fg).xz)) ab(。它恰好似乎正常工作,因为实际匹配在另一个ab( ...

之后开始

解决方案:更具体地说明您允许匹配的内容。我建议从允许的字符中取括号:

(?<=a[^()]*\().*?z

答案 1 :(得分:0)

根据您的要求,“在ab(后开始,以z结尾”,则表达式应为:

(?<=ab\().*?z

如果您需要匹配a*(*z并仅捕获*z,那么此表达式将起作用:

(?<=a[^(]*\().*?z