$content = '<!--<sup><span style="font-weight:bold;color:black;">0</span></sup><br/>-->
<div class="popular-video-image">
<a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>">
<img src="/images/topvideo/1.jpg" alt=""/>
</a>
<span class="popular-video-artist ellipsis"><a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>" class="ellipsis">Far East Movement</a></span>
<span class="popular-video-title ellipsis"><a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>" class="ellipsis">Like a G6</a></span>
</div>';
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->loadHTML($content);
foreach ($dom->getElementsByTagName('a') as $node)
{
$node->setAttribute('href', 'http://mysite.ru/' . $node->getAttribute('href'));
}
$dom->formatOutput = true;
echo $dom->saveXml($dom->documentElement);
输出:
<html>
<body>
<div class="popular-video-image">
<a href="http://mysite.ru/video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>">
<img src="/images/topvideo/1.jpg" alt=""/></a>
<span class="popular-video-artist ellipsis"><a href="http://mysite.ru/video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>" class="ellipsis">Far East Movement</a></span>
<span class="popular-video-title ellipsis"><a href="http://mysite.ru/video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>" class="ellipsis">Like a G6</a></span>
</div>
</body>
</html>
我不想添加html和body标签。也不想将标记替换为<lang>
。 And
也是不必要的。
我希望收到这样的内容,这些内容位于入口处,只有修改后的链接..
抱歉英语不好!
答案 0 :(得分:4)
您在每行末尾看到
,因为您的HTML已Windows-style line endings CR+LF
。要摆脱它们,在将它们送入DOMDocument
之前在它上面运行它 - 将它们转换为Unix风格的行结尾LF
:
$content = preg_replace('/\r\n/', "\n", $content);
答案 1 :(得分:3)
saveXml使用可选参数来指定要输出的节点。
$dom->saveXml($dom->documentElement->firstChild->firstChild);
这将从输出中删除html和body标签。
答案 2 :(得分:0)
我认为<html>
和<body>
标记会被放入,因为您使用的是loadHTML
。请尝试使用loadXML
。
至于<lang>
,有被替换,否则生成的XML将无效。如果它导致你出现问题,你应该稍微改变你的方法并使用它,而不是反对它。
答案 3 :(得分:0)
<?php
$content = '<!--<sup><span style="font-weight:bold;color:black;">0</span></sup><br/>-->
<div class="popular-video-image">
<a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>">
<img src="/images/topvideo/1.jpg" alt=""/>
</a>
<span class="popular-video-artist ellipsis"><a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>" class="ellipsis">Far East Movement</a></span>
<span class="popular-video-title ellipsis"><a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>" class="ellipsis">Like a G6</a></span>
</div>';
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->loadHTML($content);
foreach ($dom->getElementsByTagName('a') as $node)
{
$node->setAttribute('href', 'http://mysite.ru/' . $node->getAttribute('href'));
}
$dom->formatOutput = true;
echo preg_replace('#^<!DOCTYPE.+?>#', '', str_replace( array('<html>', '</html>', '<body>', '</body>', "\n\n", '<', '>'), array('', '', '', '', '', '<', '>',), $dom->saveHTML()));