Question

所以我只是想用PHP抓取一个HTML页面。我在Google上查看了如何操作，并且使用了file_get_contents()方法。我写了一些代码，但我已经收到一个我无法弄清楚的错误：

    $page = file_get_contents( 'http://php.net/supported-versions.php' );
    $doc = new DOMDocument( $page );
    //print_r( $page );

foreach ( $doc->getElementsByTagName( 'table' ) as $node ) {
    print_r( $node );
}

第一个，注释掉了print_r语句DOES打印页面，但foreach循环应该是$ node中的每个表，但它什么都不打印。我做错了什么？

Answer 1

您加载了DOMDocument错误，您需要->loadHTMLFile()或类似内容。请参阅documentation here。

以下是您需要做的事情。

<?php
    libxml_use_internal_errors(true);
    $doc = new DOMDocument();
    $doc->loadHTMLFile("http://php.net/supported-versions.php");
    foreach($doc->getElementsByTagName('table') as $table){
        var_dump($table);
    }
?>

行libxml_use_internal_errors(true);确保加载html时不会抛出任何错误。由于“correct” html不支持nav和section标记。

PHP刮一个HTML页面

1 个答案: