php domdocument loadHTML和getElementsByTagName什么都不返回

时间:2013-07-09 12:48:04

标签: php

$urlToScrap = "https://play.google.com/store/apps/details?id=flipboard.app#?t=W251bGwsMSwxLDIxMiwiZmxpcGJvYXJkLmFwcCJd";
$pageContentData = file_get_contents($urlToScrap);
$doc = new DOMDocument();
$doc->loadHTML($pageContentData);
$listOfDivs = $doc->getElementsByTagName("div");
foreach ($listOfDivs as $div) {
    if($div->getAttribute("class") == "doc-banner-icon"){
        $img = $div->getElementsByTagName("img");
        var_dump($img->getAttribute("src"));
    }
}

返回空。

我在dom中有以下元素:

<div class="doc-banner-icon"><img src="somesrc"></div>

我正在尝试获取img src,因为在页面中有很多图像,我想首先获取父div,然后在其中提取图像。

解决方案在这里:

$urlToScrap = "https://play.google.com/store/apps/details?id=flipboard.app#?t=W251bGwsMSwxLDIxMiwiZmxpcGJvYXJkLmFwcCJd";
$pageContentData = file_get_contents($urlToScrap);
$doc = new DOMDocument();
$doc->loadHTML($pageContentData);
$listOfDivs = $doc->getElementsByTagName("div");
foreach ($listOfDivs as $div) {
    if($div->getAttribute("class") == "doc-banner-icon"){
        $listOfImages = $div->getElementsByTagName("img");
        foreach($listOfImages as $img){
            var_dump($img->getAttribute("src"));
        }
    }
}

1 个答案:

答案 0 :(得分:0)

您没有遗漏任何内容,var_dump无法按预期在DOMNodeList上运行。试试这个:

$listOfImages = $doc->getElementsByTagName("img");

foreach ($listOfImages as $img) {
    $imgClass = $img->getAttribute('class');

    echo $imgClass;
}

在您更新的问题中,只需更改:

$img->getAttribute("src")

为:

$img->item(0)->getAttribute("src")

鉴于您的选择标准相当复杂,您可以考虑使用XPath而不是手动导航:

$doc = new DOMDocument();
$doc->loadHTML($pageContentData);

$xpath = new DOMXPath($doc);
$img = $xpath->query("//div[@class = 'doc-banner-icon']/img");

var_dump($img->item(0)->getAttribute('src'));