Question

当谈到正则表达式时，我仍然相当绿。我想要实现的是：

来源：

<!-- Text --><b>Text</b>
    <a href="google.com">Link</a>
    <div class="col"><h1>Nested Content</h1><p>More content</p>
    </div>
<!-- END of Text -->
More text <!-- Another Tag Comment -->

预期捕获：

$1 = Text
$2 = <b>Text</b>
        <a href="google.com">Link</a>
        <div class="col"><h1>Nested Content</h1><p>More content</p>
        </div>
$3 = END of Text

当前正则表达式：

/\<\!-*( *[A-Za-z]*) *-*\>([\s\S\t\r]*)\<\!-*( *[A-Za-z]*) *-*\>/igm

问题是它过于贪婪，直到源头中的匹配结束：

$3 = Another Tag Comment

如何重构我的正则表达式以结束预期的捕获？

Answer 1

<!--((?:(?!-->).)*)-->((?:(?!<!--)[\s\S])+)<!--((?:(?!-->).)*)-->

你可以尝试一下。参见演示。

https://regex101.com/r/cA4wE0/17

Answer 2

您需要将内部模式[\s\S]*设置为非贪婪，并且还需要在最后一个字符类\s中添加[A-Za-z]*或空格。添加单词边界\b，以便进行精确的字符串匹配。

\<\!-* *([A-Za-z]*) *-*\>([\s\S]*?)<!-* *(\b[A-Za-z ]*\b) *-*\>

DEMO

正则表达式多行捕获html注释标记内和周围的文本

2 个答案: