递归或简单的PHP循环

时间:2012-01-18 02:15:52

标签: php recursion web-scraping

我在理解如何解决此循环方面遇到了一些问题:

我正在为自己开发一个小刮刀,我正试图找出如何在2种方法中循环,直到从网站上检索到所有链接。

我已经从第一页检索链接,但问题是我无法循环验证已经提取的新链接:

这是我的代码:

    $scrap->fetchlinks($url);//I scrap the links from the first page from a website

    //for each one found I insert the url in the DB with status = "n"
    foreach ($scrap->results as $result) {
        if ($result) {
            echo "$result \n";
            $crawler->insertUrl($result);

            //I select all the links with status = "n" to perform a scrap the stored links
            $urlStatusNList = $crawler->selectUrlByStatus("n");

            while (sizeof($urlStatusNList > 1)){
                foreach($urlStatusNList as $sl){
                $scrap->fetchlinks($sl->url);  // I suppose it would retrieve all the new sublinks
                $crawler->insertUrl($sl->url); // insert the sublinks in the db
                $crawler->updateUrlByIdStatus($sl->id, "s"); //update the link scraped with status = "s", so I will not check these links again

                //here I would like to return the loop for each new link in the db with status='n' until the system can not retrieve more links and stops with the script execution
                }
            }   
        }
    }

非常欢迎任何类型的帮助。提前谢谢!

1 个答案:

答案 0 :(得分:1)

在伪代码中你正在寻找类似这样的东西

do
{
    grab new links and add them to database

} while( select all not yet extracted from database > 0 )

在没有递归的情况下继续......