PHP DOM文档回显问题

时间:2010-11-27 12:09:39

标签: php domdocument

$content = '<!--<sup><span style="font-weight:bold;color:black;">0</span></sup><br/>-->
    <div class="popular-video-image">
        <a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>">
            <img src="/images/topvideo/1.jpg" alt=""/>
        </a>
        <span class="popular-video-artist ellipsis"><a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>" class="ellipsis">Far East Movement</a></span>
        <span class="popular-video-title ellipsis"><a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>" class="ellipsis">Like a G6</a></span>
    </div>';

    $dom = new DOMDocument;
    $dom->preserveWhiteSpace = false;
    $dom->loadHTML($content);
    foreach ($dom->getElementsByTagName('a') as $node)
    {
        $node->setAttribute('href', 'http://mysite.ru/' . $node->getAttribute('href'));
    }
    $dom->formatOutput = true;

    echo $dom->saveXml($dom->documentElement);

输出:

<html>
  <body>
    <div class="popular-video-image">&#13;
        <a href="http://mysite.ru/video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="&lt;lang video_go_to=Far East Movement - Like a G6&gt;">&#13;
            <img src="/images/topvideo/1.jpg" alt=""/></a>&#13;
        <span class="popular-video-artist ellipsis"><a href="http://mysite.ru/video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="&lt;lang video_go_to=Far East Movement - Like a G6&gt;" class="ellipsis">Far East Movement</a></span>&#13;
        <span class="popular-video-title ellipsis"><a href="http://mysite.ru/video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="&lt;lang video_go_to=Far East Movement - Like a G6&gt;" class="ellipsis">Like a G6</a></span>&#13;
    </div>

  </body>
</html>

我不想添加html和body标签。也不想将标记替换为&lt;lang&gt;And &#13;也是不必要的。

我希望收到这样的内容,这些内容位于入口处,只有修改后的链接..

抱歉英语不好!

4 个答案:

答案 0 :(得分:4)

您在每行末尾看到&#13;,因为您的HTML已Windows-style line endings CR+LF。要摆脱它们,在将它们送入DOMDocument之前在它上面运行它 - 将它们转换为Unix风格的行结尾LF

$content = preg_replace('/\r\n/', "\n", $content);

答案 1 :(得分:3)

saveXml使用可选参数来指定要输出的节点。

$dom->saveXml($dom->documentElement->firstChild->firstChild);

这将从输出中删除html和body标签。

答案 2 :(得分:0)

我认为<html><body>标记会被放入,因为您使用的是loadHTML。请尝试使用loadXML

至于&lt;lang&gt;被替换,否则生成的XML将无效。如果它导致你出现问题,你应该稍微改变你的方法并使用它,而不是反对它。

答案 3 :(得分:0)

<?php
    $content = '<!--<sup><span style="font-weight:bold;color:black;">0</span></sup><br/>-->
    <div class="popular-video-image">
        <a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>">
            <img src="/images/topvideo/1.jpg" alt=""/>
        </a>
        <span class="popular-video-artist ellipsis"><a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>" class="ellipsis">Far East Movement</a></span>
        <span class="popular-video-title ellipsis"><a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>" class="ellipsis">Like a G6</a></span>
    </div>';

    $dom = new DOMDocument;
    $dom->preserveWhiteSpace = false;
    $dom->loadHTML($content);
    foreach ($dom->getElementsByTagName('a') as $node)
    {
        $node->setAttribute('href', 'http://mysite.ru/' . $node->getAttribute('href'));
    }
    $dom->formatOutput = true;

    echo preg_replace('#^<!DOCTYPE.+?>#', '', str_replace( array('<html>', '</html>', '<body>', '</body>', "\n\n", '&lt;', '&gt;'), array('', '', '', '', '', '<', '>',), $dom->saveHTML()));