PHP删除<p>和<span>标记,其中包含空格</span> </p>

时间:2014-08-02 15:55:31

标签: php regex

我在PHP变量中保存了一些HTML字符串,其中包含这样的字符串

"Some random text <p>&nbsp;<span></span>&nbsp;</p> and the random text continues"
"<p>Some random</p> <p>&nbsp;<span></span>&nbsp;</p> and the <span> </span>"

如何删除包含空格的<p><span>标记?这样的事情:

"Some random text and the random text continues"
"<p>Some random</p> and the "

1 个答案:

答案 0 :(得分:3)

您需要使用递归:

$data = <<<'EOD'
Some random text <p>&nbsp;<span> </span>&nbsp;</p> and the random text continues
<p>Some random</p> <p>&nbsp;<span></span>&nbsp;</p> and the <span> </span>
EOD;

$pattern = '~<(p|span)>(?>\s+|&nbsp;|(?R))*</\1>~';

$result = preg_replace($pattern, '', $data);
echo $result;

模式细节:

~            # pattern delimiter
<(p|span)>   # the tagname is captured in the capture group 1
(?>          # open an atomic group: all the content that must be ignored
    \s+      # whitespaces
  |          # OR
    &nbsp;   # &nbsp;
  |          # OR
    (?R)     # recursion
)*           # repeat the atomic group
</\1>        # closing tag: with a backreference to the capture group 1
~

使用DOMDocument,您可以这样做:

function removeEmptyTags($html, $tags = false) {
    $state = libxml_use_internal_errors(true);
    $dom = new DOMDocument;
    $dom->loadHTML("<div>$html</div>", LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);

    $xp = new DOMXPath($dom);
    $xp->registerNamespace('php', 'http://php.net/xpath');
    $xp->registerPHPFunctions('isEmpty');

    $predicate = '';
    if ($tags)
        $predicate = '[' . implode(' or ', array_map(function($i) {
            return 'name()="' . $i . '"';
        }, $tags)) . ']';

    $nodeList = $xp->query('//*'. $predicate . '[php:functionString("isEmpty", .)]');

    foreach ($nodeList as $node) {
        $node->parentNode->removeChild($node);
    }

    $result = '';
    foreach ($dom->documentElement->childNodes as $node) {
        $result .= $dom->saveHTML($node);
    }

    return $result;
}

function isEmpty($txt) {
    return preg_match('~^(?:\s+|&nbsp;)*$~iu', $txt) ? true : false;
}

echo removeEmptyTags($data, ['p', 'span']);