Question

我终于让我的刮刀工作了（有点），但现在我想知道如何自动转到下一页并从那里刮取相同的信息。我正在使用cURL来复制整个页面（否则我会收到500错误）。这是我的代码：

<?

// create curl resource
        $ch = curl_init();

        // set url
        curl_setopt($ch, CURLOPT_URL, "http://example.com/results.asp?&j=t&page_no=1");

        //return the transfer as a string
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

        // $output contains the output string
        $html = curl_exec($ch);

        // close curl resource to free up system resources
        curl_close($ch);      
// print $html . "\n";

require 'simple_html_dom.php';           
$dom = new simple_html_dom();
$dom->load($html);
foreach($dom->find("div[@id='schoolsearch'] tr") as $data){
    $tds = $data->find("td");
    if(count($tds)==3){
        $record = array(
            'school' => $tds[1]->plaintext, 
            'city' => $tds[2]->plaintext
        );
        print json_encode($record) . "\n";
        file_put_contents('schools.csv', json_encode($record) . "\n", FILE_APPEND);
    }
}

?>

它并不完美，但它现在正常运作！任何人都知道如何进入下一页？

Answer 1

将它包裹在一个循环中：

$maxPages = 10;
for ($i = 1; $i <= $maxPages; $i++) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, "http://example.com/results.asp?&j=t&page_no=$i");

    etc...

}

你需要整理一下，避免在每个页面上都包含该文件，但你明白了。

使用PHP，cURL，simplehtmldom抓取“下一页”

1 个答案: