在DOMDocument中获取标签

时间:2014-09-04 14:52:40

标签: php html domdocument

我正在尝试在页面中获取表格的HTML标记:

$new_dom = new DOMDocument();

$table = '';

$nodesTable = $this->dom->getElementsbyTagName("table");

foreach($nodesTable as $nodeTable){
    $color = $nodeTable->getAttribute('bordercolordark');
    if ($color == '#73BAFF') {
        $table = $nodeTable;
    }
}

$new_dom->appendChild($table);

echo $new_dom->saveHTML();

这是somepage.html:

<html>
<table>
    <tr> <td> 10 </td> </tr>
    <tr> <td> 10 </td> </tr>
    <tr> <td> 10 </td> </tr>
    <tr> <td> 10 </td> </tr>
</table>

<table border="1" cellpadding="0" width="500" bordercolorlight="#ACD6FF" bordercolordark="#73BAFF" align="center">
    <tr>
        <td rowspan="2" colspan="2" bgcolor="#73BAFF"> </td>
        <td colspan="3" align="center" bgcolor="#ACD6FF"> Element 1 </td>
        <td colspan="3" align="center" bgcolor="#ACD6FF"> Element 2 </td>
    </tr>
    <tr>
        <td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
        <td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
        <td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
        <td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
        <td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
        <td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
    </tr>
    <tr>
        <td bgcolor="#ACD6FF" width="155" align="center"> Row 1</td>
        <td bgcolor="#ACD6FF" width="45" align="center"> 30 </td>
        <td align="center"> 50 </td>
        <td align="center"> 50 </td>
        <td align="center"> 50 </td>
        <td align="center"> 50 </td>
        <td align="center"> 50 </td>
        <td align="center"> 50 </td>
    </tr>
    <tr>
        <td bgcolor="#ACD6FF" width="155" align="center"> Row 2</td>
        <td bgcolor="#ACD6FF" width="45" align="center"> 30 </td>
        <td align="center"> 60 </td>
        <td align="center"> 60 </td>
        <td align="center"> 60 </td>
        <td align="center"> 60 </td>
        <td align="center"> 60 </td>
        <td align="center"> 60 </td>
    </tr>
    <tr>
        <td bgcolor="#ACD6FF" width="155" align="center"> Row 3</td>
        <td bgcolor="#ACD6FF" width="45" align="center"> 30 </td>
        <td align="center"> 70 </td>
        <td align="center"> 70 </td>
        <td align="center"> 70 </td>
        <td align="center"> 70 </td>
        <td align="center"> 70 </td>
        <td align="center"> 70 </td>
    </tr>
</table>

<table>
    <tr> <td> 10 </td> </tr>
    <tr> <td> 10 </td> </tr>
    <tr> <td> 10 </td> </tr>
    <tr> <td> 10 </td> </tr>
</table>

<table>
    <tr> <td> 10 </td> </tr>
    <tr> <td> 10 </td> </tr>
    <tr> <td> 10 </td> </tr>
    <tr> <td> 10 </td> </tr>
</table>

</html>

$ new_dom只输出\ n而不是HTML标记。我试着看看这个答案:PHP DOMDocument stripping HTML tags,但是以这种方式附加表也不起作用。

1 个答案:

答案 0 :(得分:2)

Fatal error: Uncaught exception 'DOMException' with message 'Wrong Document Error' 

因此,您无法将节点从一个文档移动到另一个文档...如果您想这样做,则必须使用importNode() deep标记。

$dom = new DOMDocument();
$dom->loadHTMLFile('x.html');
$new_dom = new DOMDocument();

$table = '';

$nodesTable = $dom->getElementsbyTagName("table");

foreach($nodesTable as $nodeTable){
    $color = $nodeTable->getAttribute('bordercolordark');
    if ($color == '#73BAFF') {
        $table = $new_dom->importNode($nodeTable, true);
    }
}

$new_dom->appendChild($table);

echo $new_dom->saveHTML();

这只导入表元素,但不导入子元素......

注意:我在您的案例libxml_disable_entity_loader(true);中禁用了实体加载程序。我不确定XEE攻击是否也适用于loadHTML(),但仅仅是出于安全考虑。