按列NOT行解析HTML表

时间:2012-06-25 16:06:40

标签: php html parsing dom

如何通过解析html表td而不是按行解析(使用DOMDocument / DOMXPath)?意思是根据它所在的表的列来解析td元素,而不是它在哪个tr中?

1 个答案:

答案 0 :(得分:2)

HTML表按行而不是列分组。最好的办法是迭代行并将行分配给适当的数组,并按键同步。

示例:

<?php

$html = <<<HTML
<table id="target">
    <tr>
        <td>Col1</td>
        <td>Col2</td>
        <td>Col3</td>
    </tr>
    <tr>
        <td>Col1</td>
        <td>Col2</td>
        <td>Col3</td>
    </tr>
    <tr>
        <td>Col1</td>
        <td>Col2</td>
        <td>Col3</td>
    </tr>
    <tr>
        <td>Col1</td>
        <td>Col2</td>
        <td>Col3</td>
    </tr>
</table>
HTML;

$document = new DOMDocument();
$document->loadHTML($html);

$table = $document->getElementById("target");
$tr_list = $table->getElementsByTagName("tr");

$tr_count = 0;
$columns = array();
foreach ($tr_list as $tr) {
    /** @var $tr DOMElement */
    $td_list = $tr->getElementsByTagName("td");
    $tr_count++;
    $td_count = 0;
    foreach ($td_list as $td) {
        $columns[$tr_count][$td_count] = $td;
        $td_count++;
    }

}

print_r($columns);