如何提取子<a> tags and prints them out

时间:2016-08-03 21:48:17

标签: php html dom

I trying to extract links from a sub <a> tag that is inside a <div> element. I have used PHP's DOM to parse HTML explained at this site: [ http://htmlparsing.com/php.html] [1]。我还使用[Using PHP DOM document, to select HTML element by its class and get its text [2]中的相关答案修改了代码,以使用类名选择元素。以下是HTML结构和PHP代码。但是,PHP代码似乎不能很好地工作,因为它一旦到达第11个元素就会停止打印链接。

HTML结构:

    <div class="avtar-abt">
    <h3 class="mb6"><a href="testingwebsite.com1"></i></a></h3>
    </div>

  <div class="avtar-abt">
    <h3 class="mb6"><a href="testingwebsite.com2"></i></a></h3>
    </div>

  <div class="avtar-abt">
    <h3 class="mb6"><a href="testingwebsite.com3"></i></a></h3>
    </div>

PHP代码:

    # Create a DOM parser object
$dom = new DOMDocument();

# Parse the HTML from Google.
# The @ before the method call suppresses any warnings that
# loadHTML might throw because of invalid HTML in the page.
@$dom->loadHTML($html);
$xpath = new DOMXPath ($dom);

$classname = 'mb6';

foreach ($xpath->query("//*[@class='$classname']/a") as $link) {
    echo $link->getAttribute('href');
    echo "<br />";

}  

1 个答案:

答案 0 :(得分:1)

你不应该使用两个循环(第一个循环的语法错误BTW)。通过将/a添加到搜索路径,您可以使用XPath直接访问链接节点:

foreach ($xpath->query("//*[@class='$classname']/a") as $link) {
    echo $link->getAttribute('href');
    echo "<br />";
}