使用Xpath和PHP从表中搜索数据?

时间:2014-06-24 05:53:33

标签: php xpath web-scraping

我想从下表中提取数据,下面给出了表的标记。我正在使用Xpath从表中提取数据,但也欢迎其他建议。

      <div style="clear:both;" id="showPrice">
      <br>
      <table cellspacing="1">
         <tbody>
             <tr>
                 <td width="50px" style="text-align: left" class="tdhead">SN</td>
                 <td width="650px" style="text-align: left" class="tdhead">Companies</td>
                 <td width="20px" class="tdhead">Trans</td>
                 <td width="50px" class="tdhead"> Max Price</td>
                 <td width="50px" class="tdhead">Min Price</td>
                 <td width="50px" class="tdhead">Closing Price</td>
                 <td width="50px" class="tdhead">Total Shares</td>
                 <td width="50px" class="tdhead">Amount Rs.</td>
                 <td width="50px" class="tdhead">Prev. Closing</td>
                 <td width="20px" class="tdhead">Diff.</td>
                 <td width="50px" class="tdhead">Diff. %</td>
                 <td colspan="3" class="closing-price">
                     <table>
                         <tbody>
                            <tr>
                               <td colspan="3">365&nbsp;days</td>
                             </tr>
                             <tr>
                               <td width="50px" class="closing-price-lighter">Max Price</td>
                               <td width="50px" class="closing-price-lighter">Min Price</td>
                               <td width="50px" class="closing-price-lighter">Avg</td>    
                             </tr>
                         </tbody>
                     </table>
                     </td>
                    </tr>
                    <tr style="background-color: #A61A00">
                       <td style="text-align: center;color:white;">1</td>
                       <td style="text-align: left;padding:3px;">
                          <a href="viewcompany.php?symbol=ACEDBL&amp;id=177" style="text-decoration:none;color:white;">Ace Development Bank Limited</a>
                       </td>
                       <td class="numeric-data">3</td>
                       <td class="numeric-data">269.00</td>
                       <td class="numeric-data">264.00</td><td class="numeric-data" style="background-color:#99CCFF;color:black;">264.00</td>
                       <td class="numeric-data">495</td>
                       <td class="numeric-data">131,405</td>
                       <td class="numeric-data">265.00</td>
                       <td class="numeric-data">-1.00</td>
                       <td class="numeric-data" style="background-color:#99CCFF;color:black;">-0.38</td>
                       <td class="numeric-data" style="background-color:#99FFFF;color:black;">281</td>
                       <td class="numeric-data" style="background-color:#99FFFF;color:black;">102</td>
                       <td class="numeric-data" style="background-color:#99FFFF;color:black;">150.15</td>       
                   </tr>
               </tbody>
            </table>
         </div>

我只想要收盘价类之后的数据。我需要的数据是 tr 的以下 td 中的文字和数值:

                       <td style="text-align: left;padding:3px;">
                          <a href="viewcompany.php?symbol=ACEDBL&amp;id=177" style="text-decoration:none;color:white;">Ace Development Bank Limited</a>
                       </td>
                       <td class="numeric-data">3</td>
                       <td class="numeric-data">269.00</td>
                       <td class="numeric-data">264.00</td><td class="numeric-data" style="background-color:#99CCFF;color:black;">264.00</td>
                       <td class="numeric-data">495</td>
                       <td class="numeric-data">131,405</td>
                       <td class="numeric-data">265.00</td>
                       <td class="numeric-data">-1.00</td>
                       <td class="numeric-data" style="background-color:#99CCFF;color:black;">-0.38</td>
                       <td class="numeric-data" style="background-color:#99FFFF;color:black;">281</td>
                       <td class="numeric-data" style="background-color:#99FFFF;color:black;">102</td>
                       <td class="numeric-data" style="background-color:#99FFFF;color:black;">150.15</td>       
                   </tr>   

我尝试了以下表达但无法得到结果:

  //div[@id='showPrice']/td[preceding-sibling::td[@class='closing-price']]/text()

1 个答案:

答案 0 :(得分:0)

您也可以这样做:

将其指向特定的<tr>标记:

$html_string = file_get_contents('http://www.sharesansar.com/today.php');
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html_string);
libxml_clear_errors();
$xpath = new DOMXpath($dom);
$values = array();
$row = $xpath->query('//div[@id="showPrice"]/table[1]/tr[2]/td');
foreach($row as $value) {
    $values[] = trim($value->textContent);
}

echo '<pre>';
print_r($values);

结果:

Array
(
    [0] => 1
    [1] => Ace Development Bank Limited
    [2] => 3
    [3] => 269.00
    [4] => 264.00
    [5] => 264.00
    [6] => 495
    [7] => 131,405
    [8] => 265.00
    [9] => -1.00
    [10] => -0.38
    [11] => 281
    [12] => 102
    [13] => 150.15
)