I trying to extract links from a sub <a>
tag that is inside a <div>
element. I have used PHP's DOM to parse HTML explained at this site: [ http://htmlparsing.com/php.html] [1]。我还使用[Using PHP DOM document, to select HTML element by its class and get its text [2]中的相关答案修改了代码,以使用类名选择元素。以下是HTML结构和PHP代码。但是,PHP代码似乎不能很好地工作,因为它一旦到达第11个元素就会停止打印链接。
HTML结构:
<div class="avtar-abt">
<h3 class="mb6"><a href="testingwebsite.com1"></i></a></h3>
</div>
<div class="avtar-abt">
<h3 class="mb6"><a href="testingwebsite.com2"></i></a></h3>
</div>
<div class="avtar-abt">
<h3 class="mb6"><a href="testingwebsite.com3"></i></a></h3>
</div>
PHP代码:
# Create a DOM parser object
$dom = new DOMDocument();
# Parse the HTML from Google.
# The @ before the method call suppresses any warnings that
# loadHTML might throw because of invalid HTML in the page.
@$dom->loadHTML($html);
$xpath = new DOMXPath ($dom);
$classname = 'mb6';
foreach ($xpath->query("//*[@class='$classname']/a") as $link) {
echo $link->getAttribute('href');
echo "<br />";
}
答案 0 :(得分:1)
你不应该使用两个循环(第一个循环的语法错误BTW)。通过将/a
添加到搜索路径,您可以使用XPath直接访问链接节点:
foreach ($xpath->query("//*[@class='$classname']/a") as $link) {
echo $link->getAttribute('href');
echo "<br />";
}