Foreward

Question

我正在尝试匹配'<TAG2>'，只要它不在<TAG>内。

例如：

This is a WORD --- Match
<TAG><TAG2>xxx</TAG2></TAG> --- Not a match
<TAG>xxxxxxx<TAG2>yyyy</TAG2>xxxxxxx</TAG>  --- Not a match

我正在使用PHP，所以我不能做一个可变长度负面的后视。

我尝试在Match text not inside span tags中使用正则表达式，但如果有多个标记，这在我的情况下不起作用。

<TAG><TAG2>xxx</TAG2></TAG>
<TAG><TAG2>xxx</TAG2></TAG>  - This will match from the first <TAG2> to  the end of the second </TAG2>.  I'm assuming this is because my regex includes <TAG2>[\s\S]*</TAG2>

Answer 1

Foreward

我建议使用解析引擎，但听起来您可以对HTML的复杂性进行创造性控制。因此，只要您没有复杂的嵌套情况或其他奇怪的边缘情况，那么这应该可行。

描述

(<tag2>.*?</tag2>)|<tag>(?:(?!<tag\s?>).)*

Regular expression visualization

此正则表达式将执行以下操作：

使用<tag2>...</tag2填充捕获组1，前提是此标记尚未包含在<tag>...</tag>内，如<tag>.<tag2>..</tag2>.</tag>
这也将匹配所有<tag>...<tag>，但是在匹配发生的地方，捕获组1将没有值。

实施例

现场演示

https://regex101.com/r/uQ7xR5/1

示例文字

This <tag2>is a WORD</tag2> --- Match
<TAG><TAG2>xxx</TAG2></TAG> --- Not a match
<TAG>xxxxxxx<TAG2>yyyy</TAG2>xxxxxxx</TAG>  --- Not a match

样本匹配

请注意，捕获组1仅由<tag2>...</tag2表示，而<tag>..</tag>

中没有封装它

[0][0] = <tag2>is a WORD</tag2>
[0][1] = <tag2>is a WORD</tag2>

[1][0] = <TAG><TAG2>xxx</TAG2></TAG> --- Not a match
[1][1] = 

[2][0] = <TAG>xxxxxxx<TAG2>yyyy</TAG2>xxxxxxx</TAG>  --- Not a match
[2][1] =

解释

NODE                     EXPLANATION
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    <tag2>                   '<tag2>'
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
    </tag2>                  '</tag2>'
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
 |                        OR
----------------------------------------------------------------------
  <tag>                    '<tag>'
----------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
----------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
      <tag                     '<tag'
----------------------------------------------------------------------
      \s?                      whitespace (\n, \r, \t, \f, and " ")
                               (optional (matching the most amount
                               possible))
----------------------------------------------------------------------
      >                        '>'
----------------------------------------------------------------------
    )                        end of look-ahead
----------------------------------------------------------------------
    .                        any character except \n
----------------------------------------------------------------------
  )*                       end of grouping
----------------------------------------------------------------------

如果正则表达式不在标记

1 个答案:

Foreward

描述

实施例

解释