DomDocument解析Newline适用于span而不是img

时间:2016-05-19 01:39:49

标签: php xml dom newline

见这里:https://ideone.com/bjs3IC

为什么新行正确显示span s而不是img s?

<?php
    outputImages();
    outputSpans();




    function outputImages(){
        $html = "<div class='test'>
                    <pre>
                    <img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
                    <img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
                    <img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
                    </pre>
                </div>";
        getHtml($html);
    }


    function outputSpans(){
        $html = "<div class='test'>
                    <pre>
                    <span>a</span>
                    <span>b</span>
                    <span>c</span>
                    </pre>
                </div>";
        getHtml($html);
    }


    function getHtml($html){
        $doc = new DOMDocument;
        $doc->loadhtml($html);
        $xpath = new DOMXPath($doc);
        $tags = $xpath->query('//div[@class="test"]');
        print(get_inner_html($tags[0]));
    }


    function get_inner_html( $node ) {
        $innerHTML= '';
        $children = $node->childNodes;
        foreach ($children as $child) {
            $innerHTML .= $child->ownerDocument->saveXML( $child );
        }

        return $innerHTML;
    }

1 个答案:

答案 0 :(得分:2)

DOMDocument::loadHTML函数有第二个options参数。看起来LIBXML_NOBLANKS是(至少有一个)默认值。

您可以使用

$doc->loadhtml($html, LIBXML_NOEMPTYTAG);

要覆盖该默认值,您的代码对两个样本的工作方式相同。

P.S。
不确定为什么使用

print(get_inner_html($tags[0]));

$tags变量是DOMNodeList,因此您应该使用$tags->item(0)来获取第一个标记。

您的完整代码应如下所示:

outputImages();
outputSpans();

function outputImages() {
    $html = "<div class='test'>
                <pre>
                <img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
                <img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
                <img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
                </pre>
            </div>";
    getHtml($html);
}

function outputSpans() {
    $html = "<div class='test'>
                <pre>
                <span>a</span>
                <span>b</span>
                <span>c</span>
                </pre>
            </div>";
    getHtml($html);
}

function getHtml($html) {
    $doc = new DOMDocument;
    $doc->loadHTML($html, LIBXML_NOEMPTYTAG);
    $xpath = new DOMXPath($doc);
    $tags = $xpath->query('//div[@class="test"]');
    print(get_inner_html($tags->item(0)));
}

function get_inner_html( $node ) {
    $innerHTML= '';
    $children = $node->childNodes;
    foreach ($children as $child) {
        $innerHTML .= $child->ownerDocument->saveXML( $child );
    }
    return $innerHTML;
}