正则表达式从html表中抓取数据

时间:2014-02-20 18:10:31

标签: javascript php regex

在一个网站上有表格形式的数据。我得到像这样的源代码

<tbody>
    <tr>
        <td></td>
        <td><a href="http://www.altassets.net/ventureforum/" target="_blank">AltAssets Venture Forum</a></td>
        <td>27 March 2014</td>
        <td>London, UK</td>
    </tr>
    <tr>
        <td></td>
        <td>AltAssets Limited Partner Summit</td>
        <td>3-4 June 2014</td>
        <td>London, UK</td>
    </tr>
    <tr>
        <td></td>
        <td>AltAssets Limited Partner Summit</td>
        <td>3-4 June 2014</td>
        <td>London, UK</td>
    </tr>
    <tr>
        <td></td>
        <td>LP-GP Forum: Infrastructure &amp; Real Estate</td>
        <td>7 October 2014</td>
        <td>London, UK</td>
    </tr>
    <tr>
        <td></td>
        <td>Envirotech &amp; Clean Energy Investor Summit</td>
        <td>4-5 November 2014</td>
        <td>London, UK</td>
    </tr>
    <tr>
        <td></td>
        <td>AltAssets Fundraising &amp; IR Forum</td>
        <td>9 December 2014</td>
        <td>Hong Kong</td>
    </tr>
</tbody>

是否可以编写分别给出事件,日期,城市的正则表达式?

2 个答案:

答案 0 :(得分:1)

您应该可以使用:<td>.+?</td>

答案 1 :(得分:1)

$matches = array();
preg_match_all("/<tr>(.*)<\/tr>/sU", $s, $matches);
$trs = $matches[1];
$td_matches = array();
foreach ($trs as $tr) {
    $tdmatch = array();
    preg_match_all("/<td>(.*)<\/td>/sU", $tr, $tdmatch);
    $td_matches[] = $tdmatch[1];
}
print_r($td_matches);

将您的字符串放在$s中。 $td_matches包含一个嵌套数组,其中所有TD内容由每个TR分隔。