我在PHP变量中保存了一些HTML字符串,其中包含这样的字符串
"Some random text <p> <span></span> </p> and the random text continues"
"<p>Some random</p> <p> <span></span> </p> and the <span> </span>"
如何删除包含空格的<p>
和<span>
标记?这样的事情:
"Some random text and the random text continues"
"<p>Some random</p> and the "
答案 0 :(得分:3)
您需要使用递归:
$data = <<<'EOD'
Some random text <p> <span> </span> </p> and the random text continues
<p>Some random</p> <p> <span></span> </p> and the <span> </span>
EOD;
$pattern = '~<(p|span)>(?>\s+| |(?R))*</\1>~';
$result = preg_replace($pattern, '', $data);
echo $result;
模式细节:
~ # pattern delimiter
<(p|span)> # the tagname is captured in the capture group 1
(?> # open an atomic group: all the content that must be ignored
\s+ # whitespaces
| # OR
#
| # OR
(?R) # recursion
)* # repeat the atomic group
</\1> # closing tag: with a backreference to the capture group 1
~
使用DOMDocument,您可以这样做:
function removeEmptyTags($html, $tags = false) {
$state = libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTML("<div>$html</div>", LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
$xp = new DOMXPath($dom);
$xp->registerNamespace('php', 'http://php.net/xpath');
$xp->registerPHPFunctions('isEmpty');
$predicate = '';
if ($tags)
$predicate = '[' . implode(' or ', array_map(function($i) {
return 'name()="' . $i . '"';
}, $tags)) . ']';
$nodeList = $xp->query('//*'. $predicate . '[php:functionString("isEmpty", .)]');
foreach ($nodeList as $node) {
$node->parentNode->removeChild($node);
}
$result = '';
foreach ($dom->documentElement->childNodes as $node) {
$result .= $dom->saveHTML($node);
}
return $result;
}
function isEmpty($txt) {
return preg_match('~^(?:\s+| )*$~iu', $txt) ? true : false;
}
echo removeEmptyTags($data, ['p', 'span']);