Question

我试图用带有正则表达式的文本替换所有标题（h1，h2，h3等），但它只替换第一个开始标记和最后一个标记。

这是我的代码：

<?php
$regex = '/<h(?:[\d]{1})(?:[^>]*)>([^<].*)<\/h(?:[\d]{1})>/mi';
$str = '<h1 class="text-align-center" style="font-size:22px;margin-top:0px;margin-bottom:0px;color:rgb(0,0,0);font-family:IntroBold, sans-serif;line-height:1.5;letter-spacing:0px;font-weight:700;text-align:center;">You should be&nbsp;confident solving wicked problems in a hybrid role between strategy, research, design and business&nbsp;through a discovery driven approach.&nbsp;</h1><p></p><h2 style="margin-top:0px;margin-bottom:.5em;font-family:IntroBold, sans-serif;font-size:19px;line-height:1em;text-transform:uppercase;letter-spacing:1px;font-weight:700;"><strong>KEY RESPONSIBILITIES</strong></h2>';
echo preg_replace($regex, '<strong>$1</strong>', $str);

结果是<strong>[...]</h1><p></p><h2...>[...]</strong>，但当然是错误的。

Answer 1

您可以使用替代方法simple_dom_html。

你可以用这个做很多事情，包括你的担忧。

在这里你可以实现：

$dom = new simple_html_dom();
foreach($dom->find("h1,h2,h3,h4,h5") as $e)
            $e->outertext = "<strong>".$e->innertext."";

我正在用强大的替换所有标头标签如果你也想要你的内联css。

Answer 2

匹配标题有很多性能方面的路径：

<h(\d)[^>]*>([^<]*(<(?!\/h\1)[^<]*)*)<\/h\1>

Live demo

*引擎在accepted answer中使用提供的正则表达式找到61步中的匹配项，引擎需要采取太多步骤（1193 steps）来匹配相同的部分。

正确的方式：

虽然正则表达式在大多数情况下看起来很方便，但为正确的工作选择合适的工具是一种很好的做法：DOMDocument。

$dom = new domdocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new domxpath($dom);
$headings = $xpath->query("//h1 | //h2 | //h3 | //h4 | //h5 | //h6");
foreach ($headings as $h) {
    $s = $dom->createElement("strong", $h->nodeValue);
    $h->parentNode->replaceChild($s, $h);
}
echo $dom->saveHTML();

PHP live demo

Answer 3

显然，regexp不是HTML解析的完美解决方案，如果你想要一个更安全的解决方案，你应该找到一个HTML解析器并按照这种方式进行。

然而，这个正则表达式将做一个不错的工作，并将适用于提供的示例：

/<h\d.*?>(.*?)<\/h\d>/ims

Proof

preg_replace标题强大

3 个答案:

正确的方式：