DOMDocument如何从节点获取元素?

时间:2016-06-13 11:46:25

标签: php parsing domdocument

$url = file_get_contents('test.html');

$DOM = new DOMDocument();
$DOM->loadHTML(mb_convert_encoding($url, 'HTML-ENTITIES', 'UTF-8'));
$trs = $DOM->getElementsByTagName('tr');
foreach ($trs as $tr) {

    foreach ($tr->childNodes as $td){
        echo ' ' .$td->nodeValue;
    }
}

的test.html

<html>
<body>
    <table>
    <tbody>
    <tr>
        <td style="background-color: #FFFF80;">1</td>
        <td><a href="test1.php" title="test1">test1</a></td>
    </tr>
    <tr>
        <td style="background-color: #FFFF80;">2</td>
        <td><a href="test2.php" title="test2">test2</a></td>
    </tr>
    <tr>
        <td style="background-color: #FFFF80;">3</td>
        <td><a href="test3.php" title="test3">test3</a></td>
    </tr>
    </tbody>
    </table>
</body>
</html>

结果我得到:

1 test1 2 test2 3 test3

但是如何从td a获取链接?

如何从td获取HTML?

P.S。:我尝试使用$td->find('a');$td->getElementsByTagName('a');,但它不起作用......

1 个答案:

答案 0 :(得分:2)

我改进了你的代码,这个版本对我来说很好:

$DOM = new DOMDocument();
$DOM->loadHTML(mb_convert_encoding($url, 'HTML-ENTITIES', 'UTF-8'));
$trs = $DOM->getElementsByTagName('tr');
foreach ($trs as $tr) {
    foreach ($tr->childNodes as $td){
        if ($td->hasChildNodes()) { //check if <td> has childnodes
            foreach($td->childNodes as $i) {
                if ($i->hasAttributes()){ //check if childnode has attributes
                    echo $i->getAttribute("href") . "\n"; // get href="" attribute
                }
            }
        }
    }
}

<强>结果:

test1.php
test2.php
test3.php