如何在PHP中使用regexp或任何其他方式获取所有td值

时间:2016-09-28 04:07:30

标签: php regex web-scraping

我有以下表结构,

我想从第3列中获取原始部件号的所有值。

请检查附加屏幕截图。

<table class="stripeMe">
  <tbody>
    <tr>
      <th>Image</th>
      <th>Part#</th>
      <th>Original#</th>
      <th>Description</th>
    </tr>
    <tr class="alt">
      <td>
        <a href="/item/8438657/High_Capacity/AC-C27/18_TO_20_VOLT_65_WATT_AC_ADAPT"><img width="75" height="75" title="AM11X-2719 18 TO 20 Volt 65 Watt AC Adapter" src="/shop/images/image.php?img=BAT\8438657.jpg&amp;thumbnail=Y"></a>
      </td>
      <td><a title="AM11X-2719 18 TO 20 Volt 65 Watt AC Adapter" href="/item/8438657/High_Capacity/AC-C27/18_TO_20_VOLT_65_WATT_AC_ADAPT">AC-C27</a></td>
      <td>TR82J</td>
      <td>18 TO 20 Volt 65 Watt AC Adapter</td>
    </tr>
    <tr class="">
      <td>
        <a href="/item/10242499/High_Capacity/DRAC90B/DURACELL_90W_19V_UNIVERSAL_NOT"><img width="75" height="75" title="AM11X-2719 Duracell 90W 19V Universal Notebook AC Adapter" src="/shop/images/no_picture_thumb.jpg"></a>
      </td>
      <td><a title="AM11X-2719 Duracell 90W 19V Universal Notebook AC Adapter" href="/item/10242499/High_Capacity/DRAC90B/DURACELL_90W_19V_UNIVERSAL_NOT">DRAC90B</a></td>
      <td>331-0536</td>
      <td>Duracell 90W 19V Universal Notebook AC Adapter</td>
    </tr>
  </tbody>
</table>

enter image description here

1 个答案:

答案 0 :(得分:0)

您可以使用此正则表达式:

<td>[A-Z0-9-]+<\/td>

demo

阅读必须:RegEx match open tags except XHTML self-contained tags