simple_html_dom - 读取html页面,两个数组

时间:2016-04-19 19:22:26

标签: php html html-parsing simple-html-dom

这是我的整个代码

// include the scrapper 
include('simple_html_dom.php');

// connect the page for scrapping
$html = file_get_html('http://www.niagarafallsreview.ca/news/local');

// make empty arrays
$headlines = array();
$links = array();

// look for 'h' headings on page
foreach($html->find('h1') as $header) {
    $headlines[] = $header->plaintext;
}

// look for 'a' links that start with 'http://www.niagarafallsreview.ca/2016/04/'
foreach($html->find('a[href^="http://www.niagarafallsreview.ca/2016/04/"]')  as $link) {
    $links[] = $link->href;
}

// trim the headlines because one on top and bottom were not needed
$output = array_slice($headlines, 1, -1); 

// for each header output a nice list of the headers 
foreach ($output as $headers){
    echo "< a href='#'>$headers</a>" . "<br />";
}

// make sure the links are unique and no doubles are found
$result = array_unique($links);

// for each link output it in a nice list
foreach ($result as $linkk){
    echo "<a href='$linkk'>$linkk</a>" . "<br />";
}   

此代码将在一个很好的列表中生成标题,并且还会生成一个很好的链接列表。

我的问题是我需要将它们组合起来,我希望$ header是href的文本,而href中的链接是$ linkk

像这样......

< a href ='$linkk'>$headers</a>

我不知道怎么做,因为我有两个foreach语句。我试图将它们结合起来,但我没有成功。

非常感谢任何帮助。

感谢。

2 个答案:

答案 0 :(得分:0)

以下是您正在寻找的预告:

foreach($output as $i=>$headers) {
  $linkk = $result[$i];

  echo "< a href='$linkk'>$headers</a>" . "<br />";
}

这假设数组具有相同的长度和正确的顺序。

答案 1 :(得分:0)

试试这个:

// include the scrapper 
include('simple_html_dom.php');

// connect the page for scrapping
$html = file_get_html('http://www.niagarafallsreview.ca/news/local');

// make empty arrays
$headlines = array();
$links = array();

// look for 'h' headings on page
foreach($html->find('h1') as $header) {
    $headlines[] = $header->plaintext;
}

// look for 'a' links that start with 'http://www.niagarafallsreview.ca/2016/04/'
foreach($html->find('a[href^="http://www.niagarafallsreview.ca/2016/04/"]')  as $link) {
    $links[] = $link->href;
}

// trim the headlines because one on top and bottom were not needed
$output = array_slice($headlines, 1, -1); 

// make sure the links are unique and no doubles are found
$result = array_unique($links);

// for each link output it in a nice list
foreach ($result as $i=>$linkk) {
    $headline = isset($output[$i]) ? $output[$i] : '(empty)';
    echo "<a href='$linkk'>$headline</a>" . "<br />";
}