简单的HTML DOM Parser忽略表

时间:2015-09-07 20:43:41

标签: php parsing html-parsing simple-html-dom html-parser

我正在尝试解析一个包含阿根廷上次总统选举结果的特定表格。 HTML包含24个包含每个省的特定数据,然后是另外两个表。我正在尝试解析第25个表格。

当解析前24个表中的一个时,简单的HTML DOM解析器工作得很好,但由于某种原因它不适用于第25个表。这些表的结构几乎相同,所以我怀疑它与此有关。

表格的HTML: 表24

 <table>
 <thead>
     <tr>
         <th colspan='2' class='nombre'>San Luis</th>
     </tr>
 </thead>
 <tbody>
     <tr>
         <td>Mesas totales</td>
         <td class='mestot'>1.129</td>
     </tr>
     <tr>
         <td>Mesas escrutadas</td>
         <td class='mesesc'>1.099</td>
     </tr>
     <tr>
         <td>% mesas escrutadas</td>
         <td class='pmesesc'>97,34%</td>
     </tr>
 </tbody>

表25

<table id='tablaagrupaciones'>
<colgroup>
    <col width='70%' />
    <col width='15%' />
    <col width='15%' />
</colgroup>
<thead>
    <tr>
        <th class='literal'>Agrupaciones políticas / Fórmulas</th>
        <th class='literal' colspan='2'>Votos</th>
    </tr>
</thead>
<tbody>
    <tr class='r1 agrup'>
        <td class='denom'>ALIANZA FRENTE PARA LA VICTORIA </td>
        <td class='vot'>8.424.749</td>
        <td class='pvot'>38,41%</td>
    </tr>
    <tr class='r1 lista'>
        <td class='denom'>SCIOLI, DANIEL OSVALDO - ZANNINI, CARLOS ALBERTO </td>
        <td class='vot'>8.424.749</td>
        <td class='pvot'>100%</td>
    </tr>
    <tr class='r2 agrup'>
        <td class='denom'>ALIANZA CAMBIEMOS </td>
        <td class='vot'>6.595.914</td>
        <td class='pvot'>30,07%</td>
    </tr>
    <tr class='r2 lista'>
        <td class='denom'>MACRI, MAURICIO - MICHETTI, MARTA GABRIELA </td>
        <td class='vot'>5.325.990</td>
        <td class='pvot'>80,75%</td>
    </tr>
    <tr class='r2 lista'>
        <td class='denom'>SANZ, ERNESTO RICARDO - LLACH, LUCAS </td>
        <td class='vot'>756.777</td>
        <td class='pvot'>11,47%</td>
    </tr>
    <tr class='r2 lista'>
        <td class='denom'>CARRIO, ELISA MARIA AVELINA - FLORES, HECTOR ANTONIO </td>
        <td class='vot'>513.147</td>
        <td class='pvot'>7,78%</td>
    </tr>
    <tr class='r1 agrup'>
        <td class='denom'>ALIANZA UNIDOS POR UNA NUEVA ALTERNATIVA (UNA) </td>
        <td class='vot'>4.525.497</td>
        <td class='pvot'>20,63%</td>
    </tr>
    <tr class='r1 lista'>
        <td class='denom'>MASSA, SERGIO TOMAS - RUBERTO SAENZ, GUSTAVO ADOLFO </td>
        <td class='vot'>3.121.589</td>
        <td class='pvot'>68,98%</td>
    </tr>
    <tr class='r1 lista'>
        <td class='denom'>DE LA SOTA, JOSE MANUEL - RUCCI, CLAUDIA MONICA </td>
        <td class='vot'>1.403.908</td>
        <td class='pvot'>31,02%</td>
    </tr>
    <tr class='r2 agrup'>
        <td class='denom'>ALIANZA PROGRESISTAS </td>
        <td class='vot'>769.316</td>
        <td class='pvot'>3,51%</td>
    </tr>
    <tr class='r2 lista'>
        <td class='denom'>STOLBIZER, MARGARITA ROSA - OLAVIAGA, MIGUEL ANGEL </td>
        <td class='vot'>769.316</td>
        <td class='pvot'>100%</td>
    </tr>
    <tr class='r1 agrup'>
        <td class='denom'>ALIANZA FRENTE DE IZQUIERDA Y DE LOS TRABAJADORES </td>
        <td class='vot'>726.054</td>
        <td class='pvot'>3,31%</td>
    </tr>
    <tr class='r1 lista'>
        <td class='denom'>DEL CAÑO, NICOLAS - BREGMAN, MYRIAM TERESA </td>
        <td class='vot'>370.764</td>
        <td class='pvot'>51,07%</td>
    </tr>
    <tr class='r1 lista'>
        <td class='denom'>ALTAMIRA, JORGE - GIORDANO, JUAN CARLOS </td>
        <td class='vot'>355.290</td>
        <td class='pvot'>48,93%</td>
    </tr>
    <tr class='r2 agrup'>
        <td class='denom'>ALIANZA COMPROMISO FEDERAL </td>
        <td class='vot'>462.304</td>
        <td class='pvot'>2,11%</td>
    </tr>
    <tr class='r2 lista'>
        <td class='denom'>RODRIGUEZ SAA, ADOLFO - NEGRE DE ALONSO, LILIANA TERESITA </td>
        <td class='vot'>462.304</td>
        <td class='pvot'>100%</td>
    </tr>
    <tr class='r1 agrup'>
        <td class='denom'>ALIANZA FRENTE POPULAR </td>
        <td class='vot'>109.141</td>
        <td class='pvot'>0,50%</td>
    </tr>
    <tr class='r1 lista'>
        <td class='denom'>DE GENNARO, VICTOR NORBERTO - CODONI, EVANGELINA SOLEDAD </td>
        <td class='vot'>109.141</td>
        <td class='pvot'>100%</td>
    </tr>
    <tr class='r2 agrup'>
        <td class='denom'>MOVIMIENTO AL SOCIALISMO </td>
        <td class='vot'>102.969</td>
        <td class='pvot'>0,47%</td>
    </tr>
    <tr class='r2 lista'>
        <td class='denom'>CASTAÑEIRA, MANUELA JIMENA - AYALA, JORGE LUIS </td>
        <td class='vot'>102.969</td>
        <td class='pvot'>100%</td>
    </tr>
    <tr class='r1 agrup'>
        <td class='denom'>MST - NUEVA IZQUIERDA </td>
        <td class='vot'>96.414</td>
        <td class='pvot'>0,44%</td>
    </tr>
    <tr class='r1 lista'>
        <td class='denom'>BODART, HUGO ALEJANDRO - RIPOLL, VILMA ANA </td>
        <td class='vot'>96.414</td>
        <td class='pvot'>100%</td>
    </tr>
    <tr class='r2 agrup'>
        <td class='denom'>PARTIDO POPULAR </td>
        <td class='vot'>82.900</td>
        <td class='pvot'>0,38%</td>
    </tr>
    <tr class='r2 lista'>
        <td class='denom'>YATTAH, MAURICIO JORGE - MORETTA, MARIA BELEN </td>
        <td class='vot'>82.900</td>
        <td class='pvot'>100%</td>
    </tr>
    <tr class='r1 agrup'>
        <td class='denom'>MOVIMIENTO DE ACCION VECINAL </td>
        <td class='vot'>41.214</td>
        <td class='pvot'>0,17%</td>
    </tr>
    <tr class='r1 lista'>
        <td class='denom'>ALBARRACIN, RAUL HUMBERTO - DIB, GASTON </td>
        <td class='vot'>41.214</td>
        <td class='pvot'>100%</td>
    </tr>
</tbody>

我要解析的完整HTML是:

http://www.resultados.gob.ar/web/dat99/DPR99999A.htm

我的代码:

include_once 'simple_html_dom.php';


$html = file_get_html('http://www.resultados.gob.ar/web/dat99/DPR99999A.htm');



/*----------------------24TH TABLE: SAN LUIS----------------*/
//this table (and the previous ones) are parsed just fine

$table = $html->find('table',23);
$rowData = array();


foreach($table->find('tr') as $row) {
    // initialize array to store the cell data from each row
    $flight = array();
    foreach($row->find('td') as $cell) {
        // push the cell's text to the array
        $flight[] = $cell->plaintext;
    }
    $rowData[] = $flight;
}

echo '<table><thead><tr><th>Table 24 - San Luis</th></tr></thead>';
foreach ($rowData as $row => $tr) {
    echo '<tr>'; 
    foreach ($tr as $td)
        echo '<td>' . $td .'</td>';
    echo '</tr>';
}
echo '</table>';


/*-----------------------TABLE 25TH - WITH RESULTS IM INTERESTED IN----------------*/
$table = $html->find('table',24);
$rowData = array();


foreach($table->find('tr') as $row) {
    // initialize array to store the cell data from each row
    $flight = array();
    foreach($row->find('td') as $cell) {
        // push the cell's text to the array
        $flight[] = $cell->plaintext;
    }
    $rowData[] = $flight;
}

echo '<table><thead><tr><th>Tabla 25 - Tabla with results</th></tr></thead>';
foreach ($rowData as $row => $tr) {
    echo '<tr>'; 
    foreach ($tr as $td)
        echo '<td>' . $td .'</td>';
    echo '</tr>';
}
echo '</table>';

0 个答案:

没有答案