使用PHP Simple HTML Dom解析器遍历表行直到已知元素

时间:2011-10-14 11:09:39

标签: php xml simple-html-dom

确定我尝试使用 PHP Simple HTML DOM Parser 从此HTML表格构建xml Feed。

<table>
<tr><td colspan="5"><strong>Saturday October 15 2011</strong></td></tr>

<tr><td>Team 1</td>     <td>vs</td>     <td>Team 7</td> <td>3:00 pm</td></tr>
<tr><td>Team 2</td>     <td>vs</td>     <td>Team 12</td>    <td>3:00 pm</td></tr>
<tr><td>Team 3</td>     <td>vs</td>     <td>Team 8</td> <td>3:00 pm</td></tr>
<tr><td>Team 4</td>     <td>vs</td>     <td>Team 10</td>    <td>3:00 pm</td></tr>
<tr><td>Team 5</td>     <td>vs</td>     <td>Team 11</td>    <td>3:00 pm</td></tr>

<tr><td colspan="5"><strong>Monday October 17 2011</strong></td></tr>

<tr><td>Team 6</td>     <td>vs</td>     <td>Team 9</td> <td>7:45 pm</td></tr>

<tr><td colspan="5"><strong>Saturday October 22 2011</strong></td></tr>

<tr><td>Team 7</td>     <td>vs</td>     <td>Team 12</td>    <td>3:00 pm</td></tr>
<tr><td>Team 1</td>     <td>vs</td>     <td>Team 2</td> <td>3:00 pm</td></tr>
<tr><td>Team 8</td>     <td>vs</td>     <td>Team 4</td> <td>3:00 pm</td></tr>
<tr><td>Team 3</td>     <td>vs</td>     <td>Team 6</td> <td>3:00 pm</td></tr>
<tr><td>Team 9</td>     <td>vs</td>     <td>Team 5</td> <td>3:00 pm</td></td></tr>
<tr><td>Team 10</td>        <td>vs</td>     <td>Team 11</td>    <td>3:00 pm</td></tr>
</table>

我的目标是提取日期,然后提取以下行直到下一个日期。这样我就可以为每个日期构建一个XML节点。

<matchday date="Saturday October 15 2011">
    <fixture>
        <hometeam>Team 1</hometeam>
        <awayteam>Team 7</awayteam>
        <kickoff>3:00 pm</kickoff>
    </fixture>
    <fixture>
        <hometeam>Team 2</hometeam>
        <awayteam>Team 12</awayteam>
        <kickoff>3:00 pm</kickoff>
    </fixture>
</matchday>

我目前拥有html中的每个日期并构建了各自的xml节点

$dateNodes = $html->find('table tr td[colspan="5"] strong');

foreach($dateNodes as $date){
    echo '<matchday day="'.trim($date->innertext).'">';
    // FIXTURES

    // END FIXTURES
    echo '</matchday>';
}

我将如何获得每个灯具的团队名称等,直到下一个比赛日为止?

1 个答案:

答案 0 :(得分:2)

相反,如果SimpleHtmlDom (which I believe is a craptaculous library),您可以使用XSLT transformationPHP's native XSLT processor

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes" method="xml"/>
  <xsl:template match="/">
    <matchdays>
      <xsl:for-each select="table/tr[td[@colspan=5]]">
        <matchday>
          <xsl:attribute name="date">
            <xsl:value-of select="td/strong"/>
          </xsl:attribute>
          <xsl:for-each select="following-sibling::tr[
            not(td[@colspan]) and 
            preceding-sibling::tr[td[@colspan]][1] = current()
          ]">
            <fixture>
              <hometeam><xsl:value-of select="td[1]"/></hometeam>
              <awayteam><xsl:value-of select="td[3]"/></awayteam>
              <kickoff><xsl:value-of select="td[4]"/></kickoff>
            </fixture>
          </xsl:for-each>                   
        </matchday>
      </xsl:for-each>
    </matchdays>
  </xsl:template>   
</xsl:stylesheet>

然后只使用http://php.net/manual/en/xsltprocessor.transformtoxml.php示例中给出的代码将HTML转换为XML:

$xml = new DOMDocument;
$xml->load('YourSourceFile.xml');
$xsl = new DOMDocument;
$xsl->load('YourStyleSheet.xsl');
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl);
echo $proc->transformToXML($xml);

Demo at Codepad


除了使用XSLT之外,您还可以使用PHP的本机DOM扩展:

$xml = new DOMDocument;
$xml->loadHtmlFile('YourHtmlFile.xml');
$xp = new DOMXPath($xml);   
$new = new DOMDocument('1,0', 'utf-8');
$new->appendChild($new->createElement('matchdays'));
foreach ($xp->query('//table/tr/td[@colspan=5]/strong') as $gameDate) {
    $matchDay = $new->createElement('matchday');
    $matchDay->setAttribute('date', $gameDate->nodeValue);
    foreach ($xp->query(
        sprintf(
            '//tr[
                not(td[@colspan]) and
                preceding-sibling::tr[td[@colspan]][1]/td/strong/text() = "%s"
            ]',
            $gameDate->nodeValue
        )
    ) as $gameData) {
        $tds = $gameData->getElementsByTagName('td');
        $fixture = $matchDay->appendChild($new->createElement('fixture'));
        $fixture->appendChild($new->createElement(
            'hometeam', $tds->item(0)->nodeValue)
        );
        $fixture->appendChild($new->createElement(
            'awayteam', $tds->item(2)->nodeValue)
        );
        $fixture->appendChild($new->createElement(
            'kickoff', $tds->item(3)->nodeValue)
        );
    }
    $new->documentElement->appendChild($matchDay);
}
$new->formatOutput = true;
echo $new->saveXML();

Demo at Codepad