在PHP中获取不规则表达式中的内容

时间:2014-10-29 21:56:01

标签: php html

我有这样的页面内容:

<table  width="100%" >
<!--Başla--><tr>
<td><a href="http://www.example.com/duyurular/2014/ekim/kutlama.html" class="duyuru1" target="_blank">&bull; Kutlama
<br /><span class="hmk">&nbsp;&nbsp;&nbsp;&nbsp; Authority 28.10.2014</span></td></tr><tr><td><hr /></td></tr><!--Son--> 
<!--Başla--><tr>
<td><a href="http://www.example.com/duyurular/2014/ekim/genel-kurul.html" class="duyuru1" target="_blank">&bull; Genel Kurul
<br /><span class="hmk">&nbsp;&nbsp;&nbsp;&nbsp; Authority 28.10.2014</span></td></tr><tr><td><hr /></td></tr><!--Son--> 
<!--Başla--><tr>
<td><a href="http://www.example.com/duyurular/2014/ekim/katilimci.pdf" class="duyuru1" target="_blank">&bull; Katılımcı
<br /><span class="hmk">&nbsp;&nbsp;&nbsp;&nbsp; Authority 22.10.2014</span></td></tr><tr><td><hr /></td></tr><!--Son--> 
<!----duyuru başlangıc--->     
<tr >
<td ><div align="right"><a href="http://www.example.com/arsiv/duyuru/index.html" target="_blank" class="hmk"><span class="style1">Duyuru Arşivi</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</a></div>
<!-- Güncel Duyurular Bitişi--> 
</td>
</tr>
</table>

我想获得http://www.example.com/duyurular/2014/ekim/kutlama.htmlhttp://www.example.com/duyurular/2014/ekim/genel-kurul.htmlhttp://www.example.com/duyurular/2014/ekim/katilimci.pdf个链接,KutlamaGenel KurulKatılımcı链接内容,{{1} }和Authority。你看,没有HTML标准。 我试过这样:

dates

当然,我没有管理。你能帮帮我吗?

1 个答案:

答案 0 :(得分:1)

有些人不喜欢它,但正则表达式有时可以从HTML中提取内容:

if (preg_match_all('#"(https?:[^"]+)"[^&]+&bull;\s*([^<]+).+Authority ([\d.]+)#', $html, $matches)) {
  $urls = $matches[1];
  $labels = $matches[2];
  $dates = $matches[3];
}

$matches包含:

[1] => Array
    (
        [0] => http://www.example.com/duyurular/2014/ekim/kutlama.html
        [1] => http://www.example.com/duyurular/2014/ekim/genel-kurul.html
        [2] => http://www.example.com/duyurular/2014/ekim/katilimci.pdf
    )

[2] => Array
    (
        [0] => Kutlama

        [1] => Genel Kurul

        [2] => Katılımcı

    )

[3] => Array
    (
        [0] => 28.10.2014
        [1] => 28.10.2014
        [2] => 22.10.2014
    )

您可能需要trim()所有结果。