使用PHP改进使用DOM解析html代码

时间:2016-03-19 15:56:48

标签: php html

我创建了以下代码来显示一个空白页面,一个外部网站,但我不得不删除一些节点和创建一段代码所需的每个节点,如果它很大,它几乎不可行。项目

我的怀疑:

  1. 有没有办法放入一个我们想要消除的内容(页脚,标题,headerContent等)?

  2. 是否有更智能的方法来清理而不是删除元素,只显示我想要的内容(TABLE1)?

                                    

            # Create a DOM parser object
            $dom = new DOMDocument();
            libxml_use_internal_errors(true);
            $dom->loadHTMLFile('http://www.sptrans.com.br/sac/solicitacoes.aspx');
            $data = $dom -> getElementByid('TABELA1');
    
    
            $xpath = new DOMXPath($dom);
            foreach($xpath->query('//div[contains(attribute::id, "novidadeDestaque")]') as $e ) {
                // Delete this node
                $e->parentNode->removeChild($e);
            }
    
            $xpath = new DOMXPath($dom);
            foreach($xpath->query('//div[contains(attribute::id, "headerLvl1")]') as $e ) {
                // Delete this node
                $e->parentNode->removeChild($e);
            }
    
            $xpath = new DOMXPath($dom);
            foreach($xpath->query('//div[contains(attribute::id, "headerContent")]') as $e ) {
                // Delete this node
                $e->parentNode->removeChild($e);
            }
    
            $xpath = new DOMXPath($dom);
            foreach($xpath->query('//div[contains(attribute::id, "novo_menu")]') as $e ) {
                // Delete this node
                $e->parentNode->removeChild($e);
            }
            $xpath = new DOMXPath($dom);
            foreach($xpath->query('//div[contains(attribute::id, "footer")]') as $e ) {
                // Delete this node
                $e->parentNode->removeChild($e);
            }
            $xpath = new DOMXPath($dom);
            foreach($xpath->query('//div[contains(attribute::id, "header")]') as $e ) {
                // Delete this node
                $e->parentNode->removeChild($e);
            }       
            $xpath = new DOMXPath($dom);
            foreach($xpath->query('//div[contains(attribute::id, "pageNovidades")]') as $e ) {
                // Delete this node
                $e->parentNode->removeChild($e);
            }   
    
                echo $dom->saveHTML();                          
                    ?>
    </body>
    

1 个答案:

答案 0 :(得分:1)

要创建短代码例程以消除所需的元素,您可以使用数组:

$xpath = new DOMXPath($dom);
$idToDelete = [ 'novidadeDestaque', 'headerLvl1', ... ];

foreach( $idToDelete as $id )
{
    foreach($xpath->query('//div[contains(attribute::id, "'.$id.'")]') as $e ) {
        $e->parentNode->removeChild($e);
    }
}

请注意,您不需要为每次搜索创建新的DOMXPath对象:每个DOMDocument对象只能创建一次。

仅显示您想要的内容,您可以使用以下语法:

$table = $dom->GetElementById( 'MyTable' );
echo $dom->saveHTML( $table );

要使用只有所需表格的完整HTML ,您可以创建新的DOMDocument并使用importNode添加您的表格:

$src = new DOMDocument();
$dst = new DOMDocument();

$src->loadHTML( $html );
$dst->loadHTML( '<html><head><title>Untitled</title></head><body></body></html>' );

$table    = $src->GetElementById( 'MyTable' );
$imported = $dst->importNode( $table );

$dst->getElementsByTagName( 'body' )->item(0)->appendChild( $imported );

$dst->saveHTML();