我正在使用PHP的DomDocument类来解析HTML。
当我给它html它锚定它并要求它找到所有的锚点并将它们存储在一个数组中时,它给了我一个空数组,好像没有锚点。
为什么要这样做以及如何解决?
以下是代码:
$dom = new DOMDocument();
$domObject->loadHTML($content);
$anchors = $domObject->getElementsByTagName('a');
print_r($anchors); // returns empty array.
和$ content看起来像这样:
<p>
Friend David, I do not think we shall need a call bell as Hello! can be heard 10 to 20 feet away. What you think? Edison - P.S. first cost of sender & receiver to manufacture is only $7.00.[12] Hello, hello! New York, hello!
</p>
<a href="http://the-irf.com/hello/hello5.html">Prev</a>
<a href="hello7.html">Next</a>
<a href="end.html">End</a>
</body>
</html>
答案 0 :(得分:2)
$domObject
设置在哪里?试试这个:
$matchList = array();
$dom = new DOMDocument();
$dom->loadHTML($content);
$anchors = $dom->getElementsByTagName('a');
foreach($anchors as $anchor) {
array_push($matchList, $anchor->getAttribute('href'));
}
var_dump($matchList);
答案 1 :(得分:1)
请注意,修复$dom
/ $domNode
拼写错误后,代码不会返回空数组。而是返回:
DOMNodeList Object
(
)
表示它仅返回了具有私有属性的对象。因此,它在print_r()
输出中看起来为空。
但结果不为空,DOMNodeList
实现了Iterator
接口。所以你可以遍历结果:
foreach($anchors as $anchor) {
var_dump($anchor->nodeValue);
}
检查结果是否为空的简单方法是检查节点列表的长度:
echo "The query returned " . $anchors->length . " nodes";
以下是一个完整的例子:
$html = <<<EOF
<html>
<head></head>
<body>
<p>
Friend David, I do not think we shall need a call bell as Hello! can be heard 10 to 20 feet away. What you think? Edison - P.S. first cost of sender & receiver to manufacture is only $7.00.[12] Hello, hello! New York, hello!
</p>
<a href="http://the-irf.com/hello/hello5.html">Prev</a>
<a href="hello7.html">Next</a>
<a href="end.html">End</a>
</body>
</html>
EOF;
$domObject = new DOMDocument();
$domObject->loadHTML($html);
$anchors = $domObject->getElementsByTagName('a');
$links = array();
foreach($anchors as $anchor) {
$links[] = $anchor->getAttribute('href');
}
var_dump($links);
输出
string(36) "http://the-irf.com/hello/hello5.html"
string(11) "hello7.html"
string(8) "end.html"