限制从Web Crawler中提取的行

时间:2017-02-10 05:13:54

标签: php web-crawler

所以,我有这个很棒的网络爬虫代码。它从所提到的站点获取请求的数据并粘贴以及与之关联的链接。 (好男孩)

现在的问题是,如何将提取的数据限制为5行。 我尝试使用“LIMIT 5”(我们通常在php sql查询中执行),但它不起作用..

我的代码如下::

<div class="news-entry">
            <div class="newsblock">
                <div style="clear:both"></div>
                    <h2>
                       <a rel="nofollow" target="_blank" href="http://www.usmle-forums.com/usmle-step-3-forum/">
                            USMLE-Forums :: STEP-3         
                       </a>
                    </h2>
                <ul>
                    <?php
                        function get_datafour($url) {
                        $ch = curl_init();
                        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
                        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
                        curl_setopt($ch, CURLOPT_URL,$url);
                        $result=curl_exec($ch);
                        curl_close($ch);
                        return $result;
                        }
                        $returned_content = get_datafour('http://www.usmle-forums.com/usmle-step-3-forum/');
                        $first_step = explode( '<tbody id="threadbits_forum_30">' , $returned_content );
                        $second_step = explode('</tbody>', $first_step[1]);
                        $third_step = explode('<tr>', $second_step[0]);
                        // print_r($third_step);
                        foreach ($third_step as $element) {
                        $child_first = explode( '<td class="alt1"' , $element );
                        $child_second = explode( '</td>' , $child_first[1] );
                        $child_third = explode( '<a href=' , $child_second[0] );
                        $child_fourth = explode( '</a>' , $child_third[1] );
                        $final = "<a href=".$child_fourth[0]."</a></br>";
                    ?>
                    <li target="_blank" class="itemtitle">
                        <span class="item_new"></span><?php echo $final?>
                    </li>
                    <?php
                        }
                    ?>      
                </ul>        
                <div style="clear:both"></div>
            </div>
        </div>

任何建议都得到赞赏..

1 个答案:

答案 0 :(得分:1)

在第5次结果后打破Foreach循环

foreach ($third_step as $key=>$element) {
    //Your Logic Here
    if($key==4){
       break;
    }
}

我们使用$ key == 4因为索引从0开始 希望你明白了