在我的抓取php脚本中没有循环

时间:2018-04-01 10:28:53

标签: php json curl scrape

我制作了一些代码来从网站上删除标题的标题和链接,脚本就像这样

<?php

$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, 'http://dunia21.tv/?s=fast+anf+furious');
curl_setopt($ch, CURLOPT_URL, 'http://dunia21.tv/?s=fast+anf+furious');

// html
$data = curl_exec($ch);
curl_close($ch);

//
$output = array();

// inclue simple html dom
require('./lib/simple_html_dom.php');

// ubah html jadi string
$html = str_get_html($data);

// menentukan bahan yang akan diolah, yaitu class=mag-box wide-post-box
$bahan = $html->find('div[class=row content]', 0);

// ambil kotak2 postingan dalam li class=post-item
$kotak = $bahan->find('section[class=primary col-md-11]', 0);

// ekstrak kotak
foreach($kotak as $key => $val) {
    // title di h3 class=post-title
    $header = $kotak->find('header[class=col-xs-12 entry-header]', 0);
    $title = $header->find('a[rel=bookmark]', 0)->innertext;
    $url = $header->find('a[rel=bookmark]', 0)->href;   

        $output[] = array(
        'title' => $title,
        'link' => $url
        );
}

print  '<pre>';
print_r($output);
print  '<pre>';
?>

我认为这个脚本会根据我的意愿运行,而且这个脚本设法发出响应,但是这个脚本只采用第一个标题,而其他脚本没有,输出就是这样

Array
(
    [0] => Array
        (
            [title] => 2 Fast 2 Furious (2003)
            [link] => http://dunia21.tv/2-fast-2-furious-2003/
        )

    [1] => Array
        (
            [title] => 2 Fast 2 Furious (2003)
            [link] => http://dunia21.tv/2-fast-2-furious-2003/
        )

    [2] => Array
        (
            [title] => 2 Fast 2 Furious (2003)
            [link] => http://dunia21.tv/2-fast-2-furious-2003/
        )

    [3] => Array
        (
            [title] => 2 Fast 2 Furious (2003)
            [link] => http://dunia21.tv/2-fast-2-furious-2003/
        )

    [4] => Array
        (
            [title] => 2 Fast 2 Furious (2003)
            [link] => http://dunia21.tv/2-fast-2-furious-2003/
        )

    [5] => Array
        (
            [title] => 2 Fast 2 Furious (2003)
            [link] => http://dunia21.tv/2-fast-2-furious-2003/
        )

    [6] => Array
        (
            [title] => 2 Fast 2 Furious (2003)
            [link] => http://dunia21.tv/2-fast-2-furious-2003/
        )

    [7] => Array
        (
            [title] => 2 Fast 2 Furious (2003)
            [link] => http://dunia21.tv/2-fast-2-furious-2003/
        )

)

有什么建议可以解决这个问题吗?谢谢

1 个答案:

答案 0 :(得分:0)

foreach循环中的代码始终使用完整的DOM查看变量,因此它始终会找到第一个实例。

变化

$header = $kotak->find('header[class=col-xs-12 entry-header]', 0); 

$header = $val->find('header[class=col-xs-12 entry-header]', 0);