使用QueryPath从多个页面收集链接

时间:2012-09-12 19:17:20

标签: php arrays querypath

我有一个传递了一系列网址的功能。每个网页都有一系列指向其他网页的链接。我想从传递给此函数的每个网页返回这些链接的完整列表。我坚持如何在每个循环中组合数组。

 function getitemurls ($pagelinks) {
 global $host;
 foreach($pagelinks as $link) {
   $circdl = my_curl($link);
   $circqp = htmlqp($circdl,'body');
   $circlinks = array();
   foreach ($circqp->branch()->top('area[href]') as $item) {
   $circlinks[] = $item->attr('href');
    }
   for ($i = 0; $i < count($circlinks); ++$i) {
   $fullitemurl = join(array($host,$circlinks[$i]));
   }
    }
  return $fullitemurl;
 }

例如:

 Webpage 1: page1.html
 <html><body><area shape="rect" href="http://www.google.com" coords="110,151,173,225" alt=""/></body></html>

 Webpage 2: page2.html
      <html><body><area shape="rect" href="http://www.yahoo.com" coords="110,151,173,225" alt=""/></body></html>

以下是两页的数组:

 $array = array (
"0" => "page1.html",
"1" => "page2.html", );

从这个数组我想回来:

 getitemurls($array)
 Array ( [0] => http://www.google.com [1] => http://www.yahoo.com)

1 个答案:

答案 0 :(得分:0)

我最后在循环之前声明了我的数组,然后在循环中将其分配:

 function getitemurls ($pagelinks) {
  global $host;
  $fullitemurls = array();
  foreach($pagelinks as $link) {
   $circdl = my_curl($link);
   $circqp = htmlqp($circdl,'body');
   $circlinks = array();
   foreach ($circqp->branch()->top('area[href]') as $item) {
    $circlinks[] = $item->attr('href');
   }
   for ($i = 0; $i < count($circlinks); ++$i) {
    $fullitemurl[] = join(array($host,$circlinks[$i]));
   }
  }
 return $fullitemurl;
}