我使用file_get_contents使用以下代码获取远程页面的html源代码:
<?php
//Get the url
$url = "remotesite/static/section35.html";
$html = file_get_contents($url);
$doc = new DOMDocument(); // create DOMDocument
libxml_use_internal_errors(true);
$doc->loadHTML($html); // load HTML you can add $html
$elements = $doc->getElementsByTagName('tbody');
$toRemove = array();
// gather a list of tbodys to remove
foreach($elements as $el)
if((strpos($el->nodeValue, 'desktop') !== false) && !in_array($el->parentNode, $toRemove, true))
$toRemove[] = $el->parentNode;
foreach($elements as $el)
if((strpos($el->nodeValue, 'Recommended') !== false) && !in_array($el->parentNode, $toRemove, true))
$toRemove[] = $el->parentNode;
// remove them
foreach($toRemove as $tbody)
$tbody->parentNode->removeChild($tbody);
echo $doc->saveHTML(); // save new HTML
?>
我现在要做的是从源中删除每个h3
结束标记</h3>
,然后将其回显到我的页面,因为这是内容正确显示的唯一方式
答案 0 :(得分:0)
echo str_replace('</h3>','',$doc->saveHTML());