DOMXPath / DOMDocument - 在注释块中获取div

时间:2014-10-29 04:18:55

标签: php html xpath domdocument

假设我有这个包含HTML的评论栏:

<html>
<body>

<code class="hidden">
<!-- 
    <div class="a">

        <div class="b">

            <div class="c">
                <a href="link">Link Test 1</a>
            </div>

            <div class="c">
                <a href="link">Link Test 2</a>
            </div>

            <div class="c">
                <a href="link">Link Test 3</a>
            </div>

        </div>

    </div>
-->
</code>

<code>
     <!-- test -->
</code>

</body>
</html>

使用DOMXPath for PHP,如何获取标记内的链接和文本?

这是我到目前为止所做的:

    $dom = new DOMDocument();
    $dom->loadHTML("HTML STRING"); # not actually in code
    $xpath = new DOMXPath($dom);
    $query = '/html/body/code/comment()';
    $divs = $dom->getElementsByTagName('div')->item(0);

    $entries = $xpath->query($query, $divs);

    foreach($entries as $entry) {

        # shows entire text block
        echo $entry->textContent;

    }

如何导航以便我可以获取“c”类,然后将链接放入数组中?

编辑请注意,页面中有多个<code>标记,因此我不能只获取具有code属性的元素。

1 个答案:

答案 0 :(得分:1)

您已经可以定位包含链接的评论,只需按照它进行操作并在其中进行另一个查询。例如:

$sample_markup = '<html>
<body>

<code class="hidden">
<!--
    <div class="a">

        <div class="b">

            <div class="c">
                <a href="link">Link Test 1</a>
            </div>

            <div class="c">
                <a href="link">Link Test 2</a>
            </div>

            <div class="c">
                <a href="link">Link Test 3</a>
            </div>

        </div>

    </div>
-->
</code>

</body>
</html>';
$dom = new DOMDocument();
$dom->loadHTML($sample_markup); # not actually in code
$xpath = new DOMXPath($dom);
$query = '/html/body/code/comment()';
$entries = $xpath->query($query);
foreach ($entries as $key => $comment) {
    $value = $comment->nodeValue;
    $html_comment = new DOMDocument();
    $html_comment->loadHTML($value);
    $xpath_sub = new DOMXpath($html_comment);
    $links = $xpath_sub->query('//div[@class="c"]/a'); // target the links!
    // loop each link, do what you have to do
    foreach($links as $link) {
        echo $link->getAttribute('href') . '<br/>';
    }
}