PHP - DOMDocument - 根据类删除文本周围的标签

时间:2011-02-11 21:47:30

标签: php html domdocument

我有一个HTML文档,我想从特定类中删除特定标记。标签有多个类。我有一个非常简单的标记示例:

<style>.c{background-color:yellow}</style>
This is a <span class="a b c">string</span>.  
This is <span class="a b c">another string</span>.  
This is <span class="a b">yet another string</span>.

我希望能够解析该字符串(最好使用PHP的DOMDocument?),只查找<span>类的c标签,结果看起来像这样:

<style>.c{background-color:yellow}</style>
This is a string.  
This is another string.  
This is <span class="a b">yet another string</span>.

基本上,我想删除文本周围的标签,但保留文档上的文本。

更新:我认为我很接近,但它对我不起作用:

$test = '<style>.c {background-color:yellow;}</style>' .
'This is a <span class="a b c">string</span>.'.
'This is <span class="a b c">another string</span>.' .
'This is <span class="a b">yet another string</span>.';

$doc = new DOMDocument();
$doc->loadHTML($test);
$xpath = new DOMXPath($doc);
$query = "//span[contains(@class, 'c')]"; // thanks to Gordon
$oldnodes = $xpath->query($query);

foreach ($oldnodes as $oldnode) {
    $txt = $oldnode->nodeValue;
    $oldnode->parentNode->replaceChild($txt, $oldnode);
}

echo $doc->saveHTML();

1 个答案:

答案 0 :(得分:2)

你很接近......为孩子们创建一个片段:

$query = "//span[contains(concat(' ', normalize-space(@class), ' '), ' c ')]";
$oldnodes = $xpath->query($query);

foreach ($oldnodes as $node) {
    $fragment = $doc->createDocumentFragment();
    while($node->childNodes->length > 0) {
        $fragment->appendChild($node->childNodes->item(0));
    }
    $node->parentNode->replaceChild($fragment, $node);
}

由于每次迭代都会删除$node,因此不需要迭代(它会从结果集中动态删除它,因为它不再有效)...

这也将处理你在span中不仅包含文本的情况:

<span class="a b c">foo <b>bar</b> baz</span>

请注意最近的编辑:我将xpath查询更改为更强大,因为它现在只匹配精确的类c而不是toc ...

奇怪的是,它允许你在迭代中删除而不影响结果(我知道之前已经完成了,我只是不知道为什么在这里)。但这是经过测试的代码,应该很好。