这是我的代码:Online Demo
$html_string = <<<STR
<p>paragraph<a>link</a></p>
<div class="myclass">
<div>something</div>
<div style="mystyle">something</div>
<b><a href="#">link</a></b>
<b><a href="#" name="a name">link</a></b>
<b style="color:red">bold</b>
<img src="../path" alt="something" />
<img src="../path" alt="something" class="myclass" />
</div>
STR;
$dom = new DOMDocument;
$dom->loadHTML(mb_convert_encoding($html_string, 'HTML-ENTITIES', 'UTF-8'), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//@*');
foreach ($nodes as $node) {
if($node->nodeName != "src" && $node->nodeName != "href" && $node->nodeName != "alt") {
$node->parentNode->removeAttribute($node->nodeName);
}
}
echo $dom->saveHTML();
正如您在演示中看到的那样,</p>
的位置在输出中不正确。我的意思是它的位置已经改变了。为什么?我该如何解决这个问题?
答案 0 :(得分:1)
每个DOMDocument都需要一个根节点。对于HTML文档,它通常是<html>
节点。
由于根节点必需,在您的情况下 libXML占用第一个节点,您的p
元素作为根节点。
这就是为什么下一个节点div[@class="myclass"]
成为p
元素的孩子$dom->saveHTML();
将代码包裹在<html>
之类的根节点中以解决您的问题
$html_string = <<<STR
<html>
<p>paragraph<a>link</a></p>
<div class="myclass">
<div>something</div>
<div style="mystyle">something</div>
<b><a href="#">link</a></b>
<b><a href="#" name="a name">link</a></b>
<b style="color:red">bold</b>
<img src="../path" alt="something" />
<img src="../path" alt="something" class="myclass" />
</div>
</html>
STR;
$dom = new DOMDocument;
$dom->loadHTML(mb_convert_encoding($html_string, 'HTML-ENTITIES', 'UTF-8'), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//@*');
foreach ($nodes as $node) {
if($node->nodeName != "src" && $node->nodeName != "href" && $node->nodeName != "alt") {
$node->parentNode->removeAttribute($node->nodeName);
}
}
echo $dom->saveHTML();