DomDocument_object->的getElementsByTagName( 'A');当页面上有锚点时返回一个空数组

时间:2013-04-10 23:46:52

标签: php html-parsing

我正在使用PHP的DomDocument类来解析HTML。

当我给它html它锚定它并要求它找到所有的锚点并将它们存储在一个数组中时,它给了我一个空数组,好像没有锚点。

为什么要这样做以及如何解决?

以下是代码:

$dom = new DOMDocument();
$domObject->loadHTML($content);
$anchors = $domObject->getElementsByTagName('a');
print_r($anchors); // returns empty array.

和$ content看起来像这样:

     <p>
        Friend David, I do not think we shall need a call bell as Hello! can be heard 10 to 20 feet away. What you think? Edison - P.S. first cost of sender & receiver to manufacture is only $7.00.[12] Hello, hello! New York, hello!
       </p>
       <a href="http://the-irf.com/hello/hello5.html">Prev</a>
       <a href="hello7.html">Next</a>
       <a href="end.html">End</a>
    </body>
</html>

2 个答案:

答案 0 :(得分:2)

$domObject设置在哪里?试试这个:

$matchList = array();
$dom = new DOMDocument();
$dom->loadHTML($content);
$anchors = $dom->getElementsByTagName('a');
foreach($anchors as $anchor) {
    array_push($matchList, $anchor->getAttribute('href'));
}
var_dump($matchList);

答案 1 :(得分:1)

请注意,修复$dom / $domNode拼写错误后,代码不会返回空数组。而是返回:

DOMNodeList Object
(
)

表示它仅返回了具有私有属性的对象。因此,它在print_r()输出中看起来为空。

但结果不为空,DOMNodeList实现了Iterator接口。所以你可以遍历结果:

foreach($anchors as $anchor) {
    var_dump($anchor->nodeValue);
}

检查结果是否为空的简单方法是检查节点列表的长度:

echo "The query returned " . $anchors->length . " nodes";

以下是一个完整的例子:

$html = <<<EOF
<html>
  <head></head>
  <body>
     <p> 
        Friend David, I do not think we shall need a call bell as Hello! can be heard 10 to 20 feet away. What you think? Edison - P.S. first cost of sender & receiver to manufacture is only $7.00.[12] Hello, hello! New York, hello!
       </p>
       <a href="http://the-irf.com/hello/hello5.html">Prev</a>
       <a href="hello7.html">Next</a>
       <a href="end.html">End</a>
    </body>
</html>
EOF;

$domObject = new DOMDocument();
$domObject->loadHTML($html);
$anchors = $domObject->getElementsByTagName('a');

$links = array();
foreach($anchors as $anchor) {
    $links[] = $anchor->getAttribute('href');
}

var_dump($links);

输出

string(36) "http://the-irf.com/hello/hello5.html"
string(11) "hello7.html"
string(8) "end.html"