我正在尝试将HTML文件中的表读入数组,我被卡住了。 任何帮助将不胜感激。
每个表元素都应存储在1个数组值
中示例:$arr[1]= DER HE1 ges 1
PHP
<?php
libxml_use_internal_errors(true);
$i=0;
// new dom object
$dom = new DOMDocument();
//load the html
$html = $dom->loadHTMLFile("106642new.html");
//discard white space
$dom->preserveWhiteSpace = false;
//the table by its tag name
$tables = $dom->getElementsByTagName('table');
//get all rows from the table
$rows = $tables->item(0)->getElementsByTagName('tr');
// $test = $tables->item(0)->getElementsByTagName('td');
// loop over the table rows
foreach ($rows as $row) {
// get each column by tag name
$cols = $row->getElementsByTagName('td');
$i= $i + 1 ;
$value = "Nummer: ".$i.": ".$cols->item(0)->nodeValue.PHP_EOL;
// $value = "test: ".$i.": ".$cols->item(0)->nodeValue.PHP_EOL;
$cols = array(1, 2, 3, 4, 5);
echo $value;
// $cols[$i] = $row;
// echo the values
//echo $cols->item(0)->nodeValue ;
}
?>
HTML:
<body bgcolor="#FFFFFF" topmargin="0" leftmargin="0" marginwidth="0" marginheight="0">
<div align=left>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0 WIDTH="100%" height="100%">
<tr><td valign="top"> </td></tr>
<tr><td valign="top">
<p font class="Header">Basisrooster schooljaar 2011 2012 (m.i.v. 12-09-11)</font></p>
<br><div font class="lNameHeader"> </font> </div><table border=1>
<tr class="AccentDark">
<td align="left" width="65" class="tableHeader"></td>
<td align="center" width="auto" class="tableHeader">Maandag</td>
<td align="center" width="auto" class="tableHeader">Dinsdag</td>
<td align="center" width="auto" class="tableHeader">Woensdag</td>
<td align="center" width="auto" class="tableHeader">Donderdag</td>
<td align="center" width="auto" class="tableHeader">Vrijdag</td>
</tr><tr>
<td align="left" width="50" class="tableHeader">1e uur</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell"></td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell"></td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell"></td>
<td align="left" width="9" class="tableCell"></td>
</tr>
</table>
</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell">WAS</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell">HE09</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell">econ</td>
<td align="left" width="9" class="tableCell">5</td>
</tr>
</table>
</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell">WIK</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell">HC17</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell">biol</td>
<td align="left" width="9" class="tableCell">4</td>
</tr>
</table>
</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell">OTT</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell">HC01</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell">dutl</td>
<td align="left" width="9" class="tableCell">6</td>
</tr>
</table>
</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell"></td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell"></td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell"></td>
<td align="left" width="9" class="tableCell"></td>
</tr>
</table>
</td>
</tr>
<tr>
<td align="left" width="50" class="tableHeader">2e uur</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell">KEJ</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell">HC02</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell">wisA</td>
<td align="left" width="9" class="tableCell">3</td>
</tr>
</table>
</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell">BRT</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell">HE05</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell">netl</td>
<td align="left" width="9" class="tableCell"></td>
</tr>
</table>
</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell">OTT</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell">HC01</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell">dutl</td>
<td align="left" width="9" class="tableCell">6</td>
</tr>
</table>
</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell">BAU</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell">HG01</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell">lo</td>
<td align="left" width="9" class="tableCell"></td>
</tr>
</table>
</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell">MET</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell">HD02</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell">entl</td>
<td align="left" width="9" class="tableCell"></td>
</tr>
</table>
</td>
</tr>
<tr>
<td align="left" width="50" class="tableHeader">3e uur</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell">WAS</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell">HE07</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell">econ</td>
<td align="left" width="9" class="tableCell">5</td>
</tr>
</table>
</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell">MET</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell">HD02</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell">entl</td>
<td align="left" width="9" class="tableCell"></td>
</tr>
</table>
</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell">WAS</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell">HE05</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell">econ</td>
<td align="left" width="9" class="tableCell">5</td>
</tr>
</table>
</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell">BAU</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell">HG01</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell">lo</td>
<td align="left" width="9" class="tableCell"></td>
</tr>
</table>
</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell">KEJ</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell">HC02</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell">wisA</td>
<td align="left" width="9" class="tableCell">3</td>
</tr>
</table>
</td>
</tr>
<tr>
<td align="left" width="50" class="tableHeader">4e uur</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell"></td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell"></td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell"></td>
<td align="left" width="9" class="tableCell"></td>
</tr>
</table>
</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell">DER</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell">HE08</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell">ges</td>
<td align="left" width="9" class="tableCell">1</td>
</tr>
</table>
</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell">KEJ</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell">HC06</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell">wisA</td>
<td align="left" width="9" class="tableCell">3</td>
</tr>
</table>
</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell">DER</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell">HE10</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell">ges</td>
<td align="left" width="9" class="tableCell">1</td>
</tr>
</table>
</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell">CHR</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell">HB15</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell">ckv</td>
<td align="left" width="9" class="tableCell"></td>
</tr>
</table>
</td>
</tr>
<tr>
<td align="left" width="50" class="tableHeader">5e uur</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell">DOC</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell">HE09</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell">m&o</td>
<td align="left" width="9" class="tableCell">2</td>
</tr>
</table>
</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell"></td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell"></td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell"></td>
<td align="left" width="9" class="tableCell"></td>
</tr>
</table>
</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell">MET</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell">HD02</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell">entl</td>
<td align="left" width="9" class="tableCell"></td>
</tr>
</table>
</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell">BRT</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell">HE05</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell">netl</td>
<td align="left" width="9" class="tableCell"></td>
</tr>
</table>
</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell">OTT</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell">HC03</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell">dutl</td>
<td align="left" width="9" class="tableCell">6</td>
</tr>
</table>
</td>
</tr>
<tr>
<td align="left" width="50" class="tableHeader">6e uur</td>
<td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
<tr>
<td align="left" width="41" class="tableCell">OTT</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="75" class="tableCell">HC03</td>
<td align="left" width="3" class="tableCell"> </td>
<td align="left" width="73" class="tableCell">dutl</td>
<td align="left" width="9" class="tableCell">6</td>
</tr>
</table>
</td>
答案 0 :(得分:1)
如果认为问题是你的第一个表是其他表的容器。 如果你想获得所有表的内容,那么你也应该遍历表列表。
如果您只想获取内部表的内容,请先尝试在DOM中找到它。我建议找到第一个表,而不是在其中查找所有表元素并迭代它们。
var_dump是调试的一个很好的起点,你不需要你已经做过的任何其他事情,只需调试和测试更多:)
答案 1 :(得分:0)
我猜测它是无效的HTML / XML这一事实让你搞砸了。
您正在使用loadHTMLFile()函数,该函数可能在某种程度上支持格式错误的HTML,但它可能还需要有效的HTML / XML。
如果它需要有效的XML,那么可能发生的是“&lt; br&gt;”不会被解释为独立节点,而是被解释为节点的起点...意味着之后的所有内容都成为“&lt; br&gt;”的子节点。
此外这一行没有任何意义:
<p font class="Header">Basisrooster schooljaar 2011 2012 (m.i.v. 12-09-11)</font></p>
&lt; font&gt;标签已经过时多年,绝不应该使用,但更重要的是它不是字体标签而是p标签,它仍然会被关闭,就像它是一个字体标签一样。只是做:
<p class="Header">Basisrooster schooljaar 2011 2012 (m.i.v. 12-09-11)</p>
因此解决方案可能是您的HTML / XML无效。
(Dan Bizdadea也有一个很好的观点。)