Question

我想使用CURL解析XHTML内容。如何在<table>代码之间废弃交易号，重量，高度，宽度。如何使用CURL从中仅删除此HTML文档中的内容并将其作为数组？

transactions.php

 <table border=0 cellspacing=0 width=100%>
       <tr> 
        <td colspan="2">&nbsp;</td>
      </tr>
      <tr> 
        <td width="30%" class="Mellemrubrikker">Transaction Number::</td>
        <td width="70%">24752734576547IN</td>
      </tr>
      <tr> 
        <td width="30%" class="Mellemrubrikker">Weight:</td>
        <td width="70%">0.85 kg</td>
      </tr>
      <tr> 
        <td width="30%" class="Mellemrubrikker">Length:</td>
        <td width="70%">543 mm.</td>
      </tr>
      <tr> 
        <td width="30%" class="Mellemrubrikker">Height:</td>
        <td width="70%">156 mm.</td>
      </tr>
      <tr> 
        <td width="30%" class="Mellemrubrikker">Width:</td>
        <td width="70%">61 mm.</td>
      </tr>
      <tr> 
         <td colspan="2">&nbsp;</td>
      </tr>    
    </table>

的index.php

<?php
$url = "http://localhost/htmlparse/transactions.php";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
$output = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
//print_r($output);
echo $output;
?>

此代码从 transactions.php 获取整个html内容。如何在<table>之间获取数据作为数组值？

Answer 1

从http://simplehtmldom.sourceforge.net/

尝试简单的html dom

如果您不介意使用python或perl，可以使用beautifulsoup或WWW-Mechanize

Answer 2

我会使用文档对象模型，而不是编写自己的解析代码或（上帝禁止！）正则表达式。

以下是PHP中的示例：PHP Parse HTML code

如何使用CURL解析html文件中的内容？

2 个答案: