Question

我有这个preg_replace模式和替换：

$patterns = array(
    "/<br\W*?\/>/",
    "/<strong>/",
    "/<*\/strong>/",
    "/<h1>/",
    "/<*\/h1>/",
    "/<h2>/",
    "/<*\/h2>/",
    "/<em>/",
    "/<*\/em>/",
    '/(?:\<code*\>([^\<]*)\<\/code\>)/',
);
$replacements = array(
    "\n",
    "[b]",
    "[/b]",
    "[h1]",
    "[/h1]",
    "[h2]",
    "[/h2]",
    "[i]",
    "[/i]",
    '[code]***HTML DECODE HERE***[/code]',
);

在我的字符串中，我想html_entity_decode这些标记之间的内容： <code> < $gt; </code>但保留我的数组结构以进行preg替换

所以：<code> < > </code>将是：[code] < > [/code]

非常感谢任何帮助，谢谢！

Answer 1

您无法在替换字符串中对此进行编码。正如PoloRM建议的那样，您可以专门使用preg_replace_callback作为最后一次替换：

function decode_html($matches)
{
    return '[code]'.html_entity_decode($matches[1]).'[/code]';
}

$str = '<code> &lt; &gt; </code>';
$str = preg_replace_callback('/(?:\<code*\>([^\<]*)\<\/code\>)/', 'decode_html', $str);

等效地，使用create_function：

$str = preg_replace_callback(
    '/(?:\<code*\>([^\<]*)\<\/code\>)/',
    create_function(
       '$matches',
        'return \'[code]\'.html_entity_decode($matches[1]).\'[/code]\';'
    ),
    $str
);

或者，从PHP 5.3.0开始：

$str = preg_replace_callback(
    '/(?:\<code*\>([^\<]*)\<\/code\>)/',
    function ($matches) {
        return '[code]'.html_entity_decode($matches[1]).'[/code]';
    },
    $str
);

但请注意，在所有三种情况下，您的模式并非真正最佳。首先，您不需要逃避<和>（但这只是为了提高可读性）。其次，您的第一个*允许无限重复（或省略）字母e。我想你想要允许属性。第三，您不能在<code>中包含其他标记（因为[^<]与它们不匹配）。在这种情况下，也许你应该选择不合理的重复（为了方便，我也更改了分隔符）：

~(?:<code[^>]*>(.*?)</code>)~

正如您已经看到的那样，这仍然远非完美（在首先正确匹配HTML方面）。因此，强制性提醒：don't use regex to parse HTML。使用DOM解析器你会好得多。 PHP带来built-in one，还有this very convenient-to-use 3rd-party one。

Answer 2

看看这个：

http://www.php.net/manual/en/function.preg-replace-callback.php

您可以创建一个回调函数，在您的匹配中应用html_entity_decode功能。

preg_replace的特定正则表达式中的html_entity_decode

2 个答案: