Question

I trying to extract links from a sub <a> tag that is inside a <div> element. I have used PHP's DOM to parse HTML explained at this site: [ http://htmlparsing.com/php.html] [1]。我还使用[Using PHP DOM document, to select HTML element by its class and get its text [2]中的相关答案修改了代码，以使用类名选择元素。以下是HTML结构和PHP代码。但是，PHP代码似乎不能很好地工作，因为它一旦到达第11个元素就会停止打印链接。

HTML结构：

    <div class="avtar-abt">
    <h3 class="mb6"><a href="testingwebsite.com1"></i></a></h3>
    </div>

  <div class="avtar-abt">
    <h3 class="mb6"><a href="testingwebsite.com2"></i></a></h3>
    </div>

  <div class="avtar-abt">
    <h3 class="mb6"><a href="testingwebsite.com3"></i></a></h3>
    </div>

PHP代码：

    # Create a DOM parser object
$dom = new DOMDocument();

# Parse the HTML from Google.
# The @ before the method call suppresses any warnings that
# loadHTML might throw because of invalid HTML in the page.
@$dom->loadHTML($html);
$xpath = new DOMXPath ($dom);

$classname = 'mb6';

foreach ($xpath->query("//*[@class='$classname']/a") as $link) {
    echo $link->getAttribute('href');
    echo "<br />";

}

Answer 1

你不应该使用两个循环（第一个循环的语法错误BTW）。通过将/a添加到搜索路径，您可以使用XPath直接访问链接节点：

foreach ($xpath->query("//*[@class='$classname']/a") as $link) {
    echo $link->getAttribute('href');
    echo "<br />";
}

如何提取子<a> tags and prints them out

1 个答案: