我有一张简单的表格:
<table>
<tr>
<td>T1 Row 1 Col 1</td>
<td>T1 Row 1 Col 2 <IMG SRC="someimage.png" TITLE="sometitle" /></td>
<td>T1 <a href="somelink.htm">Row</a> 1 Col 3</td>
</tr>
<tr>
<td><div class="someclass">T1 Row 2 Col 1</div></td>
<td>T1 Row 2 Col 2</td>
<td>T1 Row 2 Col 3</td>
</tr>
</table>
我需要将其解析为PHP数组,以便:
$arr[0][0][0]; //would equal "T1 Row 1 Col 1"
$arr[0][0][1]; //would equal "T1 Row 1 Col 2 <IMG SRC="someimage.png" TITLE="sometitle" />"
$arr[0][0][2]; //would equal "T1 <a href="somelink.htm">Row</a> 1 Col 3"
$arr[0][1][0]; //would equal "<div class="someclass">T1 Row 2 Col 1</div>"
我尝试过DOM方式:
$dom = new DOMDocument;
$html = $dom->loadHTML($HTMLTable);
$tables = $dom->getElementsByTagName('table');
$rows = $tables->item(0)->getElementsByTagName('tr');
$cols = $rows->item(0)->getElementsByTagName('th');
$row_headers = NULL;
foreach ($cols as $node) {
$row_headers[] = $node->innerHTML;
}
$table = array();
$rows = $tables->item(0)->getElementsByTagName('tr');
foreach ($rows as $row) {
$cols = $row->getElementsByTagName('td');
$row = array();
$i=0;
foreach ($cols as $node) {
# code...
if($row_headers==NULL)
$row[] = $node->nodeValue;
else
$row[$row_headers[$i]] = $node->innerHTML;
$i++;
}
$table[] = $row;
}
但似乎没有办法在HTML完整的情况下逐字提取TD单元格的内容。它总是只返回文本,忽略任何图像或div代码内容。我已经尝试了几个东西,比如nodeValue,textContent,plaintext,innerHTML等。我可能没有看到明显的东西,所以任何建议都会受到高度赞赏。