如何将html表解析为多维数组

时间:2016-02-25 08:28:57

标签: php html arrays multidimensional-array

我正在尝试将html表解析为多维数组并将数组存储到数据库中。

我的表格的html如下所示..

 <div class="list">
        <table cellspacing="0">
        <tr class="tr-hover">
        <th rowspan="15" scope="row">Network</th>
        <td class="ttl"><a href="network-bands.php3">Technology</a></td>
        <td class="nfo"><a href="#" class="link-network-detail">GSM / HSPA / LTE</a></td>
        </tr>
        <tr class="tr-toggle">
        <td class="ttl"><a href="network-bands.php3">2G bands</a></td>
        <td class="nfo">GSM 850 / 900 / 1800 / 1900 - SIM 1 & SIM 2 (optional)</td>
        </tr><tr class="tr-toggle">
        <td class="ttl"><a href="network-bands.php3">3G bands</a></td>
        <td class="nfo">HSDPA 850 / 900 / 1900 / 2100 </td>
        </tr>
        <tr class="tr-toggle">
        <td class="ttl"><a href="network-bands.php3">4G bands</a></td>
        <td class="nfo"> LTE</td>
        </tr>
        <tr class="tr-toggle">
        <td class="ttl"><a href="glossary.php3?term=3g">Speed</a></td>
        <td class="nfo">HSPA 42.2/5.76 Mbps, LTE Cat9 450/50 Mbps</td>
        </tr>

        <tr class="tr-toggle">
        <td class="ttl"><a href="glossary.php3?term=gprs">GPRS</a></td>
        <td class="nfo">Yes</td>
        </tr>   
        <tr class="tr-toggle">
        <td class="ttl"><a href="glossary.php3?term=edge">EDGE</a></td>
        <td class="nfo">Yes</td>
        </tr>
        </table>


        <table cellspacing="0">
        <tr>
        <th rowspan="2" scope="row">Launch</th>
        <td class="ttl"><a href=# onClick="helpW('h_year.htm');">Announced</a></td>
        <td class="nfo">2016, February</td>
        </tr>   
        <tr>
        <td class="ttl"><a href=# onClick="helpW('h_status.htm');">Status</a></td>
        <td class="nfo">Coming soon. 2016, March 11</td>
        </tr>
        </table>


        <table cellspacing="0">
        <tr>
        <th rowspan="6" scope="row">Body</th>
        <td class="ttl"><a href=# onClick="helpW('h_dimens.htm');">Dimensions</a></td>
        <td class="nfo">142.4 x 69.6 x 7.9 mm (5.61 x 2.74 x 0.31 in)</td>
        </tr><tr>
        <td class="ttl"><a href=# onClick="helpW('h_weight.htm');">Weight</a></td>
        <td class="nfo">152 g (5.36 oz)</td>
        </tr>
        <tr>
        <td class="ttl"><a href="glossary.php3?term=build">Build</a></td>
        <td class="nfo">Corning Gorilla Glass back panel (unspecified version)</td>
        </tr>
        <tr>
        <td class="ttl"><a href="glossary.php3?term=sim">SIM</a></td>
        <td class="nfo">Single SIM (Nano-SIM) or Dual SIM (Nano-SIM, dual stand-by)</td>
        </tr>
        <tr><td class="ttl">&nbsp;</td><td class="nfo">- Samsung Pay (Visa, MasterCard certified)<br />
        - IP68 certified - dust proof and water resistant over 1.5 meter and 30 minutes</td></tr>

        </table>
</div>

从此我想创建一个像

这样的数组
array (
            [Network] => 
            array (
                ['technology'] => 'GSM / HSPA / LTE',
                ['2G bands'] => 'GSM 850 / 900 / 1800 / 1900 - SIM 1 & SIM 2 (optional)'
                ...
                ...
                ...
                so on
            ),

            ['Launch'] =>
            array (
                ['Announced'] => '2016, February',
                ....
                ...
                so on
            ),

            ...
            ..
            ...
            so on

        )
我到现在为止试过的是......

使用curl获取html,然后使用dom,如下所示

            foreach ( $e->find ( 'table' ) as $e1 ) {
                                $varinfo[] = $e1->innertext;
                            }
                            print_r($varinfo);

我得到了

            Array ( [0] => Network Technology GSM / HSPA / LTE 2G bands GSM 850 / 900 / 1800 / 1900 - SIM 1 & SIM 2 (optional) 3G bands HSDPA 850 / 900 / 1900 / 2100 4G bands LTE Speed HSPA 42.2/5.76 Mbps, LTE Cat9 450/50 Mbps GPRS Yes EDGE Yes [1] => Launch Announced 2016, February Status Coming soon. 2016, March 11

所以有人可以帮助我获得多维数组......我现在已经在这部分中停留了近一段时间..

谢谢

1 个答案:

答案 0 :(得分:1)

$result = [];

// get each table
foreach ($html->find('table') as $t) {
  // find the table header text to index the array
  $idx = $t->find('th')[0]->plaintext;
  // loop throught every td in the table
  foreach ($t->find('td') as $td) {
     if ($td->hasAttribute('class')) {
       // if it's a title we use the text to index the array 
       if ($td->getAttribute('class') == 'ttl') {
           $subIdx = $td->plaintext;
           $result[$idx][$subIdx] = [];
       }
       // if it's information we put it into the array
       else if ($td->getAttribute('class') == 'nfo'){
           $result[$idx][$subIdx] = $td->plaintext;
       }
     }
   }
}

var_dump($result);

结果:

array(3) { 'Network' => array(7) { 'Technology' => string(16) "GSM / HSPA / LTE" '2G bands' => string(54) "GSM 850 / 900 / 1800 / 1900 - SIM 1 & SIM 2 (optional)" '3G bands' => string(30) "HSDPA 850 / 900 / 1900 / 2100 " '4G bands' => string(4) " LTE" 'Speed' => string(41) "HSPA 42.2/5.76 Mbps, LTE Cat9 450/50 Mbps" 'GPRS' => string(3) "Yes" 'EDGE' => string(3) "Yes" } 'Launch' => array(2) { 'Announced' => string(14) "2016, February" 'Status' => string(27) "Coming soon. 2016, March 11" } 'Body' => array(5) { 'Dimensions' => string(45) "142.4 x 69.6 x 7.9 mm (5.61 x 2.74 x 0.31 in)" 'Weight' => string(15) "152 g (5.36 oz)" 'Build' => string(54) "Corning Gorilla Glass back panel (unspecified version)" 'SIM' => string(59) "Single SIM (Nano-SIM) or Dual SIM (Nano-SIM, dual stand-by)" '&nbsp;' => string(132) "- Samsung Pay (Visa, MasterCard certified) - IP68 certified - dust proof and water resistant over 1.5 meter and 30 minutes" } }