使用PHP将大写的H1,H2,...标记转换为大写标题

时间:2015-01-20 02:18:43

标签: php regex

我想用PHP将大写的h1,h2,...标签转换为大写文本。我很亲密,但还没有。下面的代码段不会将“LOREM”的第一个字符转换为大写(可能是因为它尝试将大写'<')。修改回调PHP函数会很容易,但我希望我只能通过修改正则表达式来实现这一点:

$var = "
<h1>LOREM IPSUM DOLORES AMET</h1>
THIS IS SOME TEXT
<H2>LOREM IPSUM DOLORES AMET</H2>";

$line = preg_replace_callback(
    '/<h[1-9]>(.*)\>/i',
    function ($matches) {
        return ucfirst(strtolower($matches[0]));
    },
    $var
);

print($line);

结果:

<h1>lorem ipsum dolores amet</h1>
THIS IS SOME TEXT
<H2>lorem ipsum dolores amet</H2>

期望的输出:

<h1>Lorem ipsum dolores amet</h1>
THIS IS SOME TEXT
<H2>Lorem ipsum dolores amet</H2>

4 个答案:

答案 0 :(得分:3)

您将使用$matches[0]返回整场比赛。在这种情况下使用lookarounds

我建议在第一个<h...>标记中使用捕获组,以便将其用作反向引用;因此,您将匹配从该组匹配的相同结束标记。

$text = preg_replace_callback('~<h([1-9])>\K[^<]++(?=</h\1>)~i', 
      function($m) {
         return ucfirst(strtolower($m[0]));
      }, $text);

Working Demo

虽然您可以使用正则表达式执行此操作,但我建议您使用DOM

$doc = DOMDocument::loadHTML('
    <h1>LOREM IPSUM DOLORES AMET</h1>
    THIS IS SOME TEXT
    <H2>LOREM IPSUM DOLORES AMET</H2>
');

$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//h1|//h2|//h3|//h4|//h5|//h6');

foreach ($nodes as $node) {
  $node->nodeValue = ucfirst(strtolower($node->nodeValue));
}

echo $doc->saveHTML(); 

答案 1 :(得分:1)

它不是$matches[0],而是$matches[1]matches[0]指整个匹配(即ucfirststrtolower函数适用于整个匹配),而$matches[1]指的是组索引1中存在的字符。由于我们在正则表达式中包含<h[1-9]>,因此它与起始<h>标记匹配。但是在替换部分中,我们仅包括像ucfirst(strtolower($matches[1]))这样的组索引1。因此删除了起始<h>标记。请参阅以下示例。

$var = "
<h1>LOREM IPSUM DOLORES AMET</h1>
THIS IS SOME TEXT
<H2>LOREM IPSUM DOLORES AMET</H2>";

$line = preg_replace_callback(
    '/<h[1-9]>(.*)\>/i',
    function ($matches) {
        return ucfirst(strtolower($matches[1]));
    },
    $var
);

print($line);

输出:

Lorem ipsum dolores amet</h1
THIS IS SOME TEXT
Lorem ipsum dolores amet</h2

但是上面也取代了<h1>标签。因此,我建议您使用strtolowerucfirst功能仅适用于<h>标记内的部分。

$var = "
<h1>LOREM IPSUM DOLORES AMET</h1>
THIS IS SOME TEXT
<H2>LOREM IPSUM DOLORES AMET</H2>";

$line = preg_replace_callback(
        '/<h[1-9]>\K.*?(?=<)/i',
        function ($matches) {
            return ucfirst(strtolower($matches[0]));
        },
        $var
);

print($line);

输出:

<h1>Lorem ipsum dolores amet</h1>
THIS IS SOME TEXT
<H2>Lorem ipsum dolores amet</H2>

\K会丢弃先前在匹配时打印的匹配字符。 .*?会对任意字符进行非贪婪匹配,其次数为(?=<),最高为<个字符。

答案 2 :(得分:1)

不需要正则表达式。 Obligatory link. Don't use regex to parse HTML. Ever.

DEMO

<?php

$HTMLString = <<<HTML

<h1>lorem ipsum dolores amet</h1>
THIS IS SOME TEXT
<h2>lorem ipsum dolores amet</h2>

HTML;

$doc = new DOMDocument();

$doc->loadHTML($HTMLString);

//You can also use xpath. Loop results after using this instead:
//$xpath = new DOMXPath($doc);
//$nodeList= $xpath->query(//h2);

$nodeList = $doc->getElementsByTagName('h2');

foreach ($nodeList as $node) {

    $stringArray = explode(' ', $node->nodeValue);
    $stringArray[0] = ucfirst($stringArray[0]);
    $capitalizedSentence = implode(' ', $stringArray);
    echo $capitalizedSentence;
}

来自:

lorem ipsum dolores amet

致:

Lorem ipsum dolores amet

答案 3 :(得分:1)

使用DOMDocument

<?php

        $var = "
<h1>LOREM IPSUM DOLORES AMET</h1>
THIS IS SOME TEXT
<H2>LOREM IPSUM DOLORES AMET</H2>";

        $dom = new DOMDocument();
        $dom->loadHTML($var);

        $tags = array("h1", "h2");
        //loop thru all h1 and h2 tags

        foreach ($tags as $tag) {
            //get all elements of the current tag
            $elements = $dom->getElementsByTagName($tag);
            //if we found at least 1 element
            if (!empty($elements)) {
                //loop thru each element of the given tag
                foreach ($elements as $element) {
                    //run ucfirst on the nodevalue
                    //which is equivalent to the "textContent" property of a DOM node
                $element->nodeValue = ucfirst(strtolower($element->nodeValue));
                }
            }
        }

$html = $dom->saveHTML();
//remove extra markup
$html = str_replace("</body></html>","",substr($html,strpos($html,"<h1>"));
echo $html;

<h1>Lorem ipsum dolores amet</h1>
THIS IS SOME TEXT
<h2>Lorem ipsum dolores amet</h2>