从DOMXPath中解析html表数据

时间:2014-03-03 02:51:35

标签: php html dom domxpath

我正在从100行乘3列的外部html表中抓取数据。我想将数据解析为10x10表,其中每行的数据组合在一起。例如:

<tr>
    <td>info1</td>
    <td>info2</td>
    <td>info3</td>
</tr>
<tr>
    <td>info4</td>
    <td>info5</td>
    <td>info6</td>
</tr>
<tr>
    <td>info7</td>
    <td>info8</td>
    <td>info9</td>
</tr>
...and so on

进入

<tr>
   <td>info1<br/>info2<br/>info3</td>
   <td>info4<br/>info5<br/>info6</td>
   <td>info7<br/>info8<br/>info9</td>
   ...7 more times
</tr>
...9 more times

我可以使用换行符将数据输出到单个列中。我完全不知道要做我想做的事情。此外,我希望能够使用CSS设置数据样式。任何帮助/方向表示赞赏。这是我的代码:

  $doc = new DOMDocument();
  $doc->loadHTML($html);
  libxml_clear_errors(); //remove errors for yucky html

  xpath = new DOMXPath($doc);
  $table = $xpath->query('//table[@id="idTable"]')->item(0);
  $rows = $table->getElementsByTagName("tr");

  foreach($rows as $row)
    {
      $cells = $row -> getElementsByTagName('td');
      foreach ($cells as $cell) print $cell->nodeValue . "<br/>";
    }

1 个答案:

答案 0 :(得分:1)

两种(类似)方法可以做到这一点:

1)通过计算<tr>并合并其中的每一个,忽略其<td>个数字:

$doc=new DOMDocument();
$doc->loadHTML($html);
$xpath=new DOMXPath($doc);
echo "<table>\n";
/* 10 is the row count */
for($i=0;$i<10;$i++)
{
    echo "<tr>\n";
    /* 10 is the column count */
    foreach($xpath->query('//table[@id="myTable"]/tr[position()>'.($i*10).' and position()<'.(($i+1)*10+1).']') as $tr)
    {
        echo "\t<td>";// "\t" to make it look nice
        $tds=array();
        foreach($tr->childNodes as $td)
        {
            if($td->nodeName!="td") continue;
            $tds[]=$td->firstChild->nodeValue;
        }
        echo implode("<br />",$tds);
        echo "</td>\n";
    }
    echo "</tr>\n";
}
echo "</table>";

Online demo

2)通过计算<td>并将其中的每3个合并为一个新<td>,将其中每个30个合并为一个新<tr>,忽略<tr> S:

$doc=new DOMDocument();
$doc->loadHTML($html);
$xpath=new DOMXPath($doc);
echo "<table>\n";
$i=0;
$tds=array();
foreach($xpath->query('//table[@id="myTable"]/tr/td/text()') as $td)
{
    /* 30 is each row's old-cell-count */
    if($i%30==0) echo "<tr>\n";
    $tds[]=$td->nodeValue;
    /* 3 is each cell's old-cell-count */
    if($i%3==2)
    {
        echo "\t<td>".implode("<br />",$tds)."</td>\n";
        $tds=array();
    }
    if($i%30==29) echo "</tr>\n";
    $i++;
}
echo "</table>";

Online demo

两个输出:

<table>
<tr>
    <td>info0.1<br />info0.2<br />info0.3</td>
    <td>info1.1<br />info1.2<br />info1.3</td>
    <td>info2.1<br />info2.2<br />info2.3</td>
    <td>info3.1<br />info3.2<br />info3.3</td>
    <td>info4.1<br />info4.2<br />info4.3</td>
    <td>info5.1<br />info5.2<br />info5.3</td>
    <td>info6.1<br />info6.2<br />info6.3</td>
    <td>info7.1<br />info7.2<br />info7.3</td>
    <td>info8.1<br />info8.2<br />info8.3</td>
    <td>info9.1<br />info9.2<br />info9.3</td>
</tr>
<tr>
    <td>info10.1<br />info10.2<br />info10.3</td>
    <td>info11.1<br />info11.2<br />info11.3</td>
<!-- ... -->
    <td>info97.1<br />info97.2<br />info97.3</td>
    <td>info98.1<br />info98.2<br />info98.3</td>
    <td>info99.1<br />info99.2<br />info99.3</td>
</tr>
</table>