Question

此问题是对Recursive processing of markup using Regular Expression and DOMDocument

的补充

所选答案提供的代码对于理解构建基本语法树有很大帮助。但是我现在遇到的麻烦是收紧正则表达式只匹配我的语法而不是{.而不是{{。理想情况下，我希望它只匹配我的语法：

{<anchor>}
{!image!}
{*strong*}
{/emphasis/}
{|code|}
{-strikethrough-}
{>small<}

两个标记a和small也需要不同的结束标记。我尝试修改原始代码示例中的$re_closetag以反映这一点，但它仍然与文本匹配太多。

例如：

http://www.google.com/>} bang 
smäll<} boom

我的测试字符串是：

tëstïng {{ 汉字/漢字 }} testing {<http://www.google.com/>} bang {>smäll<} boom {* strông{/ ëmphäsïs {- strïkë {| côdë |} -} /} *} {*wôw*} 1, 2, 3

Answer 1

你可以在RE本身或匹配之后控制它。

在re中，要控制哪些标签可以“打开”，请修改$re_next的这一部分：

(?:\{(?P<opentag>[^{\s]))  # match an open tag
      #which is "{" followed by anything other than whitespace or another "{"

目前，它会查找任何不是{或空格的字符。只需改为：

(?:\{(?P<opentag>[<!*/|>-]))

现在它只查找您的特定开放标记。

关闭标记部分一次只匹配一个字符，具体取决于当前上下文中打开的标记。（这是$opentag参数的用途。）因此，要匹配一对字符，只需更改$opentag以在递归调用中查找。 E.g：

        if (isset($m['opentag']) && $m['opentag'][1] !== -1) {
            list($newopen, $_) = $m['opentag'];

            // change the close character to look for in the new context
            if ($newopen==='>') $newopen = '<';
            else if ($newopen==='<') $newopen = '>';

            list($subast, $offset) = str_to_ast($s, $offset, array(), $newopen);
            $ast[] = array($newopen, $subast);
        } else if (isset($m['text']) && $m['text'][1] !== -1) {

或者，您可以按原样保留RE，并在事后确定如何处理匹配。例如，如果您匹配@字符但{@不是允许的开放标记，则可以引发解析错误或仅将其视为文本节点（将array('#text', '{@')附加到ast），或介于两者之间的任何东西。

使用正则表达式将标记解析为抽象语法树

1 个答案: