simplehtmldom - 关注链接

时间:2013-11-26 08:24:52

标签: php simple-html-dom

有人可以在抓取并获取相关信息时显示如何关注每个元素<a href>的链接的示例吗?

$html = file_get_html('http://www.blabla.com/');
$html->find('div', 1)->class = 'bar';

现在每个<li>都有一个指向更多信息的链接

<li class="#Selected">
<a href="/contactinfo/ITService/">info</a>
<h2>New York</h2>
<h3>USA</h3>
<strong>ITService</strong>
</li>

然后:

<div class="InfoD">
<h2>New York</h2>
<h3>USA</h3>
<strong>ITService</strong>
<p>
Tel. : XXXXXX   
</p>
<p>
Mail. : XXXX@XXX.com    
</p>
</div>

我知道如何使用HTML DOM抓取这些元素,但是当每个元素和多个页面都有链接时,我不知道如何...如果有人可以指向示例或任何类似的教程。感谢

1 个答案:

答案 0 :(得分:2)

首先,您获得li.#Selected a的所有链接,然后您创建一个循环以从每个链接中获取div.InfoD ...

这是一个代码段,展示了如何:

// includes Simple HTML DOM Parser
include "simple_html_dom.php";

$url = "http://www.blabla.com/";

$baseUrl= "http://www.blabla.com"

//Create a DOM object
$html = new simple_html_dom();
// Load HTML from a URL
$html->load_file($url);

// Get all links
$anchors = $html->find('li.#Selected a');

// loop through each link and get the node having "InfoD" class
// Everytime make sure to clear dom objects to avoid memory leaks
foreach ($anchors as $anchor) {

    // Create the new link to parse
    $urlTemp = $baseUrl . $anchor->href;

    //Create a DOM object
    $html2 = new simple_html_dom();
    // Load HTML from a URL
    $html2->load_file($urlTemp);

    // Get all nodes with "text-logo"
    $div = $html->find('div.InfoD', 0);

    echo $div;
    echo "<hr/>";

    // Clear dom object
    $html2->clear(); 
    unset($htm2);

}

// Clear dom object
$html->clear(); 
unset($html);