将包含IMG内容等HTML的HTML表解析为PHP数组

时间:2017-02-28 19:16:38

标签: php html dom multidimensional-array

我有一张简单的表格:

<table>
  <tr>
    <td>T1 Row 1 Col 1</td>
    <td>T1 Row 1 Col 2 <IMG SRC="someimage.png" TITLE="sometitle" /></td>
    <td>T1 <a href="somelink.htm">Row</a> 1 Col 3</td>
  </tr>
  <tr>
    <td><div class="someclass">T1 Row 2 Col 1</div></td>
    <td>T1 Row 2 Col 2</td>
    <td>T1 Row 2 Col 3</td>
  </tr>
</table>

我需要将其解析为PHP数组,以便:

$arr[0][0][0]; //would equal "T1 Row 1 Col 1"
$arr[0][0][1]; //would equal "T1 Row 1 Col 2 <IMG SRC="someimage.png" TITLE="sometitle" />"
$arr[0][0][2]; //would equal "T1 <a href="somelink.htm">Row</a> 1 Col 3"
$arr[0][1][0]; //would equal "<div class="someclass">T1 Row 2 Col 1</div>"

我尝试过DOM方式:

$dom = new DOMDocument;  
$html = $dom->loadHTML($HTMLTable);  
$tables = $dom->getElementsByTagName('table');   
$rows = $tables->item(0)->getElementsByTagName('tr');   
$cols = $rows->item(0)->getElementsByTagName('th');   
$row_headers = NULL;
foreach ($cols as $node) {
    $row_headers[] = $node->innerHTML;
}   
$table = array();
$rows = $tables->item(0)->getElementsByTagName('tr');   
foreach ($rows as $row) {   
    $cols = $row->getElementsByTagName('td');   
    $row = array();
    $i=0;
    foreach ($cols as $node) {
        # code...
        if($row_headers==NULL)
            $row[] = $node->nodeValue;
        else
            $row[$row_headers[$i]] = $node->innerHTML;
        $i++;
    }   
    $table[] = $row;
}   

但似乎没有办法在HTML完整的情况下逐字提取TD单元格的内容。它总是只返回文本,忽略任何图像或div代码内容。我已经尝试了几个东西,比如nodeValue,textContent,plaintext,innerHTML等。我可能没有看到明显的东西,所以任何建议都会受到高度赞赏。

0 个答案:

没有答案