Question

我有这个HTML文档

{*
<h2 class="block_title bg0">ahmooooooooooooooooooooooooooooooooooooooooooodi</h2>
<div class="block_content padding bg0">{welc_msg}</div>
<br/>
    {*
    hii<br /><span>5
    *}

    {*
    hii<br /><span>5

    *}
*}

我想将其删除，因此我想删除{* *}

之间的任何内容

我写了正则表达式：

preg_replace("#(\{\*(.*?)\*\})+#isx",'',$html);

并且它可以正常工作，但它理想情况下不能100％工作，最后会留下*}。

你能给我一个真实的模式吗？

Answer 1

您需要recursive regex来匹配嵌套的括号。它应该是这样的：

"#(\{\*([^{}]*?(?R)[^{}]*?)\*\})+#isx"

Answer 2

如果您的正则表达式引擎支持匹配嵌套结构（和PHP一样），那么您可以在一个遍中删除（可能是嵌套的）元素，如下所示：

递归正则表达式应用一遍：

function stripNestedElementsRecursive($text) {
    return preg_replace('/
        # Match outermost (nestable) "{*...*}" element.
        \{\*        # Element start tag sequence.
        (?:         # Group zero or more element contents alternatives.
          [^{*]++   # Either one or more non-start-of-tag chars.
        | \{(?!\*)  # or "{" that is not beginning of a start tag.
        | \*(?!\})  # or "*" that is not beginning of an end tag.
        | (?R)      # or a valid nested matching tag element.
        )*          # Zero or more element contents alternatives.
        \*\}        # Element end tag sequence.
        /x', '', $text);
}

上述递归正则表达式匹配最外层 {*...*}元素，该元素可能包含嵌套元素。

但是，如果您的正则表达式引擎不支持匹配嵌套结构，您仍然可以完成工作，但不能一次完成。可以制作与最里面的 {*...*}元素匹配的正则表达式（即不包含任何嵌套元素的元素）。这个正则表达式可以以递归方式应用，直到文本中没有更多元素为止：

递归应用非递归正则表达式：

function stripNestedElementsNonRecursive($text) {
    $re = '/
        # Match innermost (not nested) "{*...*}" element.
        \{\*        # Element start tag sequence.
        (?:         # Group zero or more element contents alternatives.
          [^{*]++   # Either one or more non-start-of-tag chars.
        | \{(?!\*)  # or "{" that is not beginning of a start tag.
        | \*(?!\})  # or "*" that is not beginning of an end tag.
        )*          # Zero or more element contents alternatives.
        \*\}        # Element end tag sequence.
        /x';
    while (preg_match($re, $text)) {
        $text = preg_replace($re, '', $text);
    }
    return $text;
}

使用正则表达式处理嵌套结构是一个高级主题，必须谨慎行事！如果真的想要将regex用于此类高级应用程序，我强烈建议您阅读这方面的经典工作主题：Mastering Regular Expressions (3rd Edition)作者：Jeffrey Friedl。我可以诚实地说，这是我读过的最有用的书。

快乐的复兴！

这个嵌套{* *}的正则表达式模式有什么问题？

2 个答案:

递归正则表达式应用一遍：

递归应用非递归正则表达式：