使用xpath和DOMDocument检索元素

时间:2012-09-22 20:13:21

标签: php xpath domdocument

我在下面的html代码中有一个广告列表。 我需要的是一个PHP循环来获取每个广告的以下元素:

  1. 广告网址(<a>代码的href属性)
  2. 广告图片网址(<img>代码的src属性)
  3. 广告标题(<div class="title">标签的html内容)
  4. <div class="ads">
        <a href="http://path/to/ad/1">
            <div class="ad">
                <div class="image">
                    <div class="wrapper">
                        <img src="http://path/to/ad/1/image.jpg">
                    </div>
                </div>
                <div class="detail">
                    <div class="title">Ad #1</div>
                </div>
            </div>
        </a>
        <a href="http://path/to/ad/2">
            <div class="ad">
                <div class="image">
                    <div class="wrapper">
                        <img src="http://path/to/ad/2/image.jpg">
                    </div>
                </div>
                <div class="detail">
                    <div class="title">Ad #2</div>
                </div>
            </div>
        </a>
    </div>
    

    我设法使用下面的PHP代码获取广告网址。

    $d = new DOMDocument();
    $d->loadHTML($ads); // the variable $ads contains the HTML code above
    $xpath = new DOMXPath($d);
    $ls_ads = $xpath->query('//a');
    
    foreach ($ls_ads as $ad) {
        $ad_url = $ad->getAttribute('href');
        print("AD URL : $ad_url");
    }
    

    但我没有设法获得其他2个元素(图片网址和标题)。有什么想法吗?

2 个答案:

答案 0 :(得分:10)

对于其他元素,您只需执行相同的操作:

foreach ($ls_ads as $ad) {
    $ad_url = $ad->getAttribute('href');
    print("AD URL : $ad_url");
    $ad_Doc = new DOMDocument();
    $ad_Doc->documentElement->appendChild($ad_Doc->importNode($ad));
    $xpath = new DOMXPath($ad_Doc);
    $img_src = $xpath->query("//img[@src]");
    $title = $xpath->query("//div[@class='title']");
}

答案 1 :(得分:10)

我设法得到了我需要的代码(基于Khue Vu的代码):

$d = new DOMDocument();
$d->loadHTML($ads); // the variable $ads contains the HTML code above
$xpath = new DOMXPath($d);
$ls_ads = $xpath->query('//a');

foreach ($ls_ads as $ad) {
    // get ad url
    $ad_url = $ad->getAttribute('href');

    // set current ad object as new DOMDocument object so we can parse it
    $ad_Doc = new DOMDocument();
    $cloned = $ad->cloneNode(TRUE);
    $ad_Doc->appendChild($ad_Doc->importNode($cloned, True));
    $xpath = new DOMXPath($ad_Doc);

    // get ad title
    $ad_title_tag = $xpath->query("//div[@class='title']");
    $ad_title = trim($ad_title_tag->item(0)->nodeValue);

    // get ad image
    $ad_image_tag = $xpath->query("//img/@src");
    $ad_image = $ad_image_tag->item(0)->nodeValue;
}