我正在尝试构建一个简单的爬虫。搜寻器运行正常,但是,我想在递归函数中输出一些消息,以了解在$crawling
数组中还有多少页面要爬行,以及当前正在爬行的页面。
下面是相关代码。我在函数内部有两个echo,但是在脚本完成之前没有任何输出。可以在递归函数内部沿途输出消息吗?
$alreadyCrawled = array();
$crawling = array();
function followLinks($url) {
global $alreadyCrawled;
global $crawling;
echo "Now crawling: $url";
$parser = new DomDocumentParser($url);
$linkList = $parser->getLinks();
// Get the links
for($i = 0; $linkList->length > $i; $i++) {
$href = $linkList->item($i)->getAttribute("href");
// Convert relative links to absolute links
if(strpos($href, "#") !== false) {
continue;
} else if(substr($href, 0, 11) === "javascript:") {
continue;
} else if(substr($href, 0, 6) === "mailto") {
continue;
}
$href = createLink($href, $url);
// Crawl page
if(!in_array($href, $alreadyCrawled)) {
$alreadyCrawled[] = $href;
$crawling[] = $href;
getDetails($href);
}
}
array_shift($crawling); // Remove page just crawled
echo "Finished crawling: $url, Pages left to crawl: " . count($crawling);
// Crawl until array is empty
foreach ($crawling as $site) {
followLinks($site);
}
}
答案 0 :(得分:0)
查看了nandal的答案和CBroe到可能重复项的链接后,我得到了下面的函数。在每次回声之后调用它就可以了。
function flush_buffers(){
ob_end_flush();
ob_flush();
flush();
ob_start();
}