DOMDocument删除html元素

时间:2013-08-29 16:21:03

标签: php xpath domdocument

这是我的代码:

$text = '<div class="cgus_post"><a href="?p=15055"><div class="imgbox"><img src="/cgmedia/default.gif"></div></a>
        <h2 id="post-15055">
        <a href="?p=15055" rel="bookmark" title="Permanent Link to Willie Nelson Celebrates 80th Birthday Stoned and Auditioning for Gandalf">Willie Nelson Celebrates 80th Birthday Stoned and Auditioning for Gandalf</a></h2>
        <p>This video pretty much sums up why Willie Nelson is fucking awesome. Willie decided to celebrate his 80th birthday by recording an ‘audition’ for Peter Jackson. &nbsp;Willie wants to take the reigns from Ian McKellan in The Hobbit 2, and decided to show off his acting skills and give some of his own wizardly advice. The result is &nbsp;hilarious. Watch …</p>
        <br class="clear">
        </div>';
$dom = new DomDocument();
$dom->loadHTML($text);
$classname = 'cgus_post';
$finder = new DomXPath($dom);
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
foreach($nodes as $node){
    echo $node->nodeValue;  
}

我遇到的问题是我正在查询包含类cgus_post的div并且只返回文本。我如何让它返回HTML元素?

1 个答案:

答案 0 :(得分:0)

这是我一直使用的innerHTML函数:

function innerHTML(DOMNode $node, $trim = true, $decode = true) {
   $innerHTML = '';

   foreach ($node->childNodes as $inner_node) {
      $temp_container = new DOMDocument();
      $temp_container->appendChild($temp_container->importNode($inner_node, true));

      $innerHTML .= ($trim ? trim($temp_container->saveHTML()) : $temp_container->saveHTML());
   }

   return ($decode ? html_entity_decode($innerHTML) : $innerHTML);
}

那么你这样做:

$dom = new DOMDocument();
$dom->loadHTML($html);

echo htmlentities(innerHTML($dom->documentElement->childNodes->item(0)->firstChild));