我想用PHP将大写的h1,h2,...标签转换为大写文本。我很亲密,但还没有。下面的代码段不会将“LOREM”的第一个字符转换为大写(可能是因为它尝试将大写'<')。修改回调PHP函数会很容易,但我希望我只能通过修改正则表达式来实现这一点:
$var = "
<h1>LOREM IPSUM DOLORES AMET</h1>
THIS IS SOME TEXT
<H2>LOREM IPSUM DOLORES AMET</H2>";
$line = preg_replace_callback(
'/<h[1-9]>(.*)\>/i',
function ($matches) {
return ucfirst(strtolower($matches[0]));
},
$var
);
print($line);
结果:
<h1>lorem ipsum dolores amet</h1>
THIS IS SOME TEXT
<H2>lorem ipsum dolores amet</H2>
期望的输出:
<h1>Lorem ipsum dolores amet</h1>
THIS IS SOME TEXT
<H2>Lorem ipsum dolores amet</H2>
答案 0 :(得分:3)
您将使用$matches[0]
返回整场比赛。在这种情况下使用lookarounds。
我建议在第一个<h...>
标记中使用捕获组,以便将其用作反向引用;因此,您将匹配从该组匹配的相同结束标记。
$text = preg_replace_callback('~<h([1-9])>\K[^<]++(?=</h\1>)~i',
function($m) {
return ucfirst(strtolower($m[0]));
}, $text);
虽然您可以使用正则表达式执行此操作,但我建议您使用DOM
。
$doc = DOMDocument::loadHTML('
<h1>LOREM IPSUM DOLORES AMET</h1>
THIS IS SOME TEXT
<H2>LOREM IPSUM DOLORES AMET</H2>
');
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//h1|//h2|//h3|//h4|//h5|//h6');
foreach ($nodes as $node) {
$node->nodeValue = ucfirst(strtolower($node->nodeValue));
}
echo $doc->saveHTML();
答案 1 :(得分:1)
它不是$matches[0]
,而是$matches[1]
。 matches[0]
指整个匹配(即ucfirst
,strtolower
函数适用于整个匹配),而$matches[1]
指的是组索引1中存在的字符。由于我们在正则表达式中包含<h[1-9]>
,因此它与起始<h>
标记匹配。但是在替换部分中,我们仅包括像ucfirst(strtolower($matches[1]))
这样的组索引1。因此删除了起始<h>
标记。请参阅以下示例。
$var = "
<h1>LOREM IPSUM DOLORES AMET</h1>
THIS IS SOME TEXT
<H2>LOREM IPSUM DOLORES AMET</H2>";
$line = preg_replace_callback(
'/<h[1-9]>(.*)\>/i',
function ($matches) {
return ucfirst(strtolower($matches[1]));
},
$var
);
print($line);
输出:
Lorem ipsum dolores amet</h1
THIS IS SOME TEXT
Lorem ipsum dolores amet</h2
但是上面也取代了<h1>
标签。因此,我建议您使用strtolower
,ucfirst
功能仅适用于<h>
标记内的部分。
$var = "
<h1>LOREM IPSUM DOLORES AMET</h1>
THIS IS SOME TEXT
<H2>LOREM IPSUM DOLORES AMET</H2>";
$line = preg_replace_callback(
'/<h[1-9]>\K.*?(?=<)/i',
function ($matches) {
return ucfirst(strtolower($matches[0]));
},
$var
);
print($line);
输出:
<h1>Lorem ipsum dolores amet</h1>
THIS IS SOME TEXT
<H2>Lorem ipsum dolores amet</H2>
\K
会丢弃先前在匹配时打印的匹配字符。 .*?
会对任意字符进行非贪婪匹配,其次数为(?=<)
,最高为<
个字符。
答案 2 :(得分:1)
不需要正则表达式。 Obligatory link. Don't use regex to parse HTML. Ever.
<?php
$HTMLString = <<<HTML
<h1>lorem ipsum dolores amet</h1>
THIS IS SOME TEXT
<h2>lorem ipsum dolores amet</h2>
HTML;
$doc = new DOMDocument();
$doc->loadHTML($HTMLString);
//You can also use xpath. Loop results after using this instead:
//$xpath = new DOMXPath($doc);
//$nodeList= $xpath->query(//h2);
$nodeList = $doc->getElementsByTagName('h2');
foreach ($nodeList as $node) {
$stringArray = explode(' ', $node->nodeValue);
$stringArray[0] = ucfirst($stringArray[0]);
$capitalizedSentence = implode(' ', $stringArray);
echo $capitalizedSentence;
}
来自:
答案 3 :(得分:1)
使用DOMDocument
<?php
$var = "
<h1>LOREM IPSUM DOLORES AMET</h1>
THIS IS SOME TEXT
<H2>LOREM IPSUM DOLORES AMET</H2>";
$dom = new DOMDocument();
$dom->loadHTML($var);
$tags = array("h1", "h2");
//loop thru all h1 and h2 tags
foreach ($tags as $tag) {
//get all elements of the current tag
$elements = $dom->getElementsByTagName($tag);
//if we found at least 1 element
if (!empty($elements)) {
//loop thru each element of the given tag
foreach ($elements as $element) {
//run ucfirst on the nodevalue
//which is equivalent to the "textContent" property of a DOM node
$element->nodeValue = ucfirst(strtolower($element->nodeValue));
}
}
}
$html = $dom->saveHTML();
//remove extra markup
$html = str_replace("</body></html>","",substr($html,strpos($html,"<h1>"));
echo $html;
<h1>Lorem ipsum dolores amet</h1>
THIS IS SOME TEXT
<h2>Lorem ipsum dolores amet</h2>