在PHP中没有外部标记的某个标记中提取Html内容

时间:2012-08-24 09:39:09

标签: php web-scraping domdocument html

我想检索某个标签中的html代码。我知道DomDocument可以做到这一点。但是,如果我想在没有外部标签的情况下提取内容,那么如何实现呢?

例如,

$html = '<div><span>Hello world!</span><br><p>some other text</p></div>';    
$doc = new DOMDocument;
$doc->loadHTML($html);
echo $doc->saveXML($doc->getElementsByTagName('div')->item(0));

这将输出,

<div>
    <span>Hello world!</span>
    <br>
    <p>some other text</p>
</div>

我希望它没有外部div标签。我尝试了节点值,但它删除了所有标签。

$html = '<div><span>Hello world!</span><br><p>some other text</p></div>';    
$doc = new DOMDocument;
$doc->loadHTML($html);
$node = $doc->getElementsByTagName('div')->item(0);
echo $node->nodeValue;

有什么想法吗?

1 个答案:

答案 0 :(得分:4)

好的,PHP innerHTML实现怎么样:

<?php 
$html = '<div><span>Hello world!</span><br><p>some other text</p></div>';    
$doc = new DOMDocument;
$doc->loadHTML($html);
$node = $doc->getElementsByTagName('div')->item(0);
echo DOMinnerHTML($node);

function DOMinnerHTML($element) 
{ 
    $innerHTML = ""; 
    $children = $element->childNodes; 
    foreach ($children as $child) 
    { 
        $tmp_dom = new DOMDocument(); 
        $tmp_dom->appendChild($tmp_dom->importNode($child, true)); 
        $innerHTML.=trim($tmp_dom->saveHTML()); 
    } 
    return $innerHTML; 
} 
?>