我正在尝试匹配'<TAG2>'
,只要它不在<TAG>
内。
例如:
This is a WORD --- Match
<TAG><TAG2>xxx</TAG2></TAG> --- Not a match
<TAG>xxxxxxx<TAG2>yyyy</TAG2>xxxxxxx</TAG> --- Not a match
我正在使用PHP,所以我不能做一个可变长度负面的后视。
我尝试在Match text not inside span tags中使用正则表达式,但如果有多个标记,这在我的情况下不起作用。
<TAG><TAG2>xxx</TAG2></TAG>
<TAG><TAG2>xxx</TAG2></TAG> - This will match from the first <TAG2> to the end of the second </TAG2>. I'm assuming this is because my regex includes <TAG2>[\s\S]*</TAG2>
答案 0 :(得分:1)
我建议使用解析引擎,但听起来您可以对HTML的复杂性进行创造性控制。因此,只要您没有复杂的嵌套情况或其他奇怪的边缘情况,那么这应该可行。
(<tag2>.*?</tag2>)|<tag>(?:(?!<tag\s?>).)*
此正则表达式将执行以下操作:
<tag2>...</tag2
填充捕获组1,前提是此标记尚未包含在<tag>...</tag>
内,如<tag>.<tag2>..</tag2>.</tag>
<tag>...<tag>
,但是在匹配发生的地方,捕获组1将没有值。 现场演示
https://regex101.com/r/uQ7xR5/1
示例文字
This <tag2>is a WORD</tag2> --- Match
<TAG><TAG2>xxx</TAG2></TAG> --- Not a match
<TAG>xxxxxxx<TAG2>yyyy</TAG2>xxxxxxx</TAG> --- Not a match
样本匹配
请注意,捕获组1仅由<tag2>...</tag2
表示,而<tag>..</tag>
[0][0] = <tag2>is a WORD</tag2>
[0][1] = <tag2>is a WORD</tag2>
[1][0] = <TAG><TAG2>xxx</TAG2></TAG> --- Not a match
[1][1] =
[2][0] = <TAG>xxxxxxx<TAG2>yyyy</TAG2>xxxxxxx</TAG> --- Not a match
[2][1] =
NODE EXPLANATION
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
<tag2> '<tag2>'
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
</tag2> '</tag2>'
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
<tag> '<tag>'
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
<tag '<tag'
----------------------------------------------------------------------
\s? whitespace (\n, \r, \t, \f, and " ")
(optional (matching the most amount
possible))
----------------------------------------------------------------------
> '>'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------