我有一个HTML文档,我想从特定类中删除特定标记。标签有多个类。我有一个非常简单的标记示例:
<style>.c{background-color:yellow}</style>
This is a <span class="a b c">string</span>.
This is <span class="a b c">another string</span>.
This is <span class="a b">yet another string</span>.
我希望能够解析该字符串(最好使用PHP的DOMDocument?),只查找<span>
类的c
标签,结果看起来像这样:
<style>.c{background-color:yellow}</style>
This is a string.
This is another string.
This is <span class="a b">yet another string</span>.
基本上,我想删除文本周围的标签,但保留文档上的文本。
更新:我认为我很接近,但它对我不起作用:
$test = '<style>.c {background-color:yellow;}</style>' .
'This is a <span class="a b c">string</span>.'.
'This is <span class="a b c">another string</span>.' .
'This is <span class="a b">yet another string</span>.';
$doc = new DOMDocument();
$doc->loadHTML($test);
$xpath = new DOMXPath($doc);
$query = "//span[contains(@class, 'c')]"; // thanks to Gordon
$oldnodes = $xpath->query($query);
foreach ($oldnodes as $oldnode) {
$txt = $oldnode->nodeValue;
$oldnode->parentNode->replaceChild($txt, $oldnode);
}
echo $doc->saveHTML();
答案 0 :(得分:2)
你很接近......为孩子们创建一个片段:
$query = "//span[contains(concat(' ', normalize-space(@class), ' '), ' c ')]";
$oldnodes = $xpath->query($query);
foreach ($oldnodes as $node) {
$fragment = $doc->createDocumentFragment();
while($node->childNodes->length > 0) {
$fragment->appendChild($node->childNodes->item(0));
}
$node->parentNode->replaceChild($fragment, $node);
}
由于每次迭代都会删除$node
,因此不需要迭代(它会从结果集中动态删除它,因为它不再有效)...
这也将处理你在span中不仅包含文本的情况:
<span class="a b c">foo <b>bar</b> baz</span>
请注意最近的编辑:我将xpath查询更改为更强大,因为它现在只匹配精确的类c
而不是toc
...
奇怪的是,它允许你在迭代中删除而不影响结果(我知道之前已经完成了,我只是不知道为什么在这里)。但这是经过测试的代码,应该很好。