仅在通过file_get_contents获取源之后才删除h3结束标记

时间:2013-05-13 09:59:43

标签: php html

我使用file_get_contents使用以下代码获取远程页面的html源代码:

<?php
    //Get the url
    $url = "remotesite/static/section35.html";
    $html = file_get_contents($url);
    $doc = new DOMDocument(); // create DOMDocument
    libxml_use_internal_errors(true);
    $doc->loadHTML($html); // load HTML you can add $html

    $elements = $doc->getElementsByTagName('tbody');

    $toRemove = array();

    // gather a list of tbodys to remove
    foreach($elements as $el)
      if((strpos($el->nodeValue, 'desktop') !== false) && !in_array($el->parentNode, $toRemove, true))
        $toRemove[] = $el->parentNode;    

            foreach($elements as $el)
      if((strpos($el->nodeValue, 'Recommended') !== false) && !in_array($el->parentNode, $toRemove, true))
        $toRemove[] = $el->parentNode;  

    // remove them
    foreach($toRemove as $tbody)
      $tbody->parentNode->removeChild($tbody);

    echo $doc->saveHTML(); // save new HTML
?>

我现在要做的是从源中删除每个h3结束标记</h3>,然后将其回显到我的页面,因为这是内容正确显示的唯一方式

1 个答案:

答案 0 :(得分:0)

echo str_replace('</h3>','',$doc->saveHTML());