使用增量页码值从多个页面获取数据

时间:2018-10-04 10:41:22

标签: php

我有一个脚本,可以根据跨度ID的内容从页面获取某些数据 但是,有200余页的结果需要浏览,每页仅显示127条结果。

我拥有的脚本确实获取了第一页上的127个元素的数据,但随后将不会打开新页面并继续获取数据 它在最初的127之后就停止了

任何帮助都会很棒

$end = 200;
$start = 1;
$stop = $start + 10;
$html = file_get_contents('http://example.com/res/'.$start);
$doc = new DOMDocument();
@$doc->loadHTML($html);
echo $stop;
$i = 0;
foreach($doc->getElementsByTagName('span') as $element ) { //Loops through all available span elements
    if (!empty($element->attributes->getNamedItem('id')->value)) { // Discards irrelevant span elements based on their `ID`. A similar sorting is achieved with `empty()` as the target `span` doesn't have any associated `ID`.
        echo "Record : ".$i.' '. $element->attributes->getNamedItem('id')->value."\n"; 
        $i++;
        $end = $start;
    }
}
if($i == 127) {
    $i = 0;
    do {
        $next = $start++;
        $page = $next;
        $html = file_get_contents('http://example.com/res/'.$page);
        $doc = new DOMDocument();
        @$doc->loadHTML($html);
        foreach($doc->getElementsByTagName('span') as $element ) 
        { 
            if (!empty($element->attributes->getNamedItem('id')->value)) 
            { 
                echo "Record : ".$i.' '. $element->attributes->getNamedItem('id')->value."\n"; 
                $i++;
                $end = $start;
            }
        }

    } while ($page != $stop);
    //echo $i.' Records';
}

1 个答案:

答案 0 :(得分:0)

如评论中所述,由于您在第一个循环中显示的最后一条记录为127,因此此后的行echo$i从127递增到128。

foreach($doc->getElementsByTagName('span') as $element ) { //Loops through all available span elements
    if (!empty($element->attributes->getNamedItem('id')->value)) { // Discards irrelevant span elements based on their `ID`. A similar sorting is achieved with `empty()` as the target `span` doesn't have any associated `ID`.
        echo "Record : ".$i.' '. $element->attributes->getNamedItem('id')->value."\n"; 
        $i++; //At last iteration, $i = 128
        $end = $start;
    }
}

然后,if($i == 127)将为假。

我建议您将条件更改为if($i >= 127)