考虑以下字符串:
ab(cd.xz) e(ab(fg).xz)) ab(hi.xz)
我想匹配在ab之后开始的每个子字符串(以z结尾。所以我写了下面的正则表达式:
(?<=a.*?\().*?z
根据RegexBuddy,这应该尝试执行以下操作:
Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=a.*?\()»
Match the character “a” literally «a»
Match any single character that is not a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “(” literally «\(»
Match any single character that is not a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “z” literally «z»
我在RegexBuddy中得到的结果是以下匹配(注意中间的一个不正常,因为它应该匹配fg).xz
)。我做错了什么?
答案 0 :(得分:4)
正则表达式按设计工作:)
在第二个示例中,lookbehind表达式与ab(cd.xz) e(
匹配。始终从字符串开头开始尝试向后看(必要时向前移动),因此.*?
匹配的次数比您想象的要多。它不是(正如人们所预期的那样)实际从当前位置向后执行。
所以在第三个例子中,lookbehind甚至匹配ab(cd.xz) e(ab(fg).xz)) ab(
。它恰好似乎正常工作,因为实际匹配在另一个ab(
...
解决方案:更具体地说明您允许匹配的内容。我建议从允许的字符中取括号:
(?<=a[^()]*\().*?z
答案 1 :(得分:0)
根据您的要求,“在ab(
后开始,以z
结尾”,则表达式应为:
(?<=ab\().*?z
如果您需要匹配a*(*z
并仅捕获*z
,那么此表达式将起作用:
(?<=a[^(]*\().*?z