preg_match_all()来自文本的多个项目

时间:2013-01-20 23:44:52

标签: php regex html-parsing preg-match preg-match-all

我有一系列这样的项目:

<tr>
    <td class="vertTh">
        <center>
            <a href="/browse/200" title="More from this category">Video</a>
            <br />
            (
            <a href="/browse/201" title="More from this category">Movies</a>
            )
        </center>
    </td>
    <td>
        <div class="detName">
            <a href="/torrent/8036528/Life.of.Pi.2012.DVDSCR" class="detLink" title="Details for Life.of.Pi.2012.DVDSCR">Life.of.Pi.2012.DVDSCR</a>
        </div>
        <a href="magnet:?xt=urn:btih:b129c8fd1c91b00589ef8fe646f52ce10148a3c9&dn=Life.of.Pi.2012.DVDSCR&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.istole.it%3A6969&tr=udp%3A%2F%2Ftracker.ccc.de%3A80" title="Download this torrent using magnet">
            <img src="//static.thepiratebay.se/img/icon-magnet.gif" alt="Magnet link" />
        </a>
        <img src="//static.thepiratebay.se/img/icon_comment.gif" alt="This torrent has 68 comments." title="This torrent has 68 comments." />
        <img src="//static.thepiratebay.se/img/icon_image.gif" alt="This torrent has a cover image" title="This torrent has a cover image" />
        <a href="/user/scene4all">
            <img src="//static.thepiratebay.se/img/vip.gif" alt="VIP" title="VIP" style="width:11px;" border='0' />
        </a> <font class="detDesc">Uploaded 01-18&nbsp;17:41, Size 1.25&nbsp;GiB, ULed by
            <a class="detDesc" href="/user/scene4all/" title="Browse scene4all">scene4all</a></font> 
    </td>
    <td align="right">33981</td>
    <td align="right">18487</td>
</tr>

如何preg_match()/ preg_match_all()

我尝试使用这种模式:

<tr>
    <td class="vertTh">
        (?P<cat>.*?)
    </td>
    <td>
        <div class="detName">
            (?P<name>.*?)
        </div>
        (?P<link>.*?)
    </td>
    <td align="right">(?P<up>.*?)</td>
    <td align="right">(?P<down>.*?)</td>
</tr>

这段代码:

preg_match_all("#$pattern#s", $item, $v);
var_dump($v);

它返回:

array(11) {
  [0]=>
  array(0) {
  }
  ["cat"]=>
  array(0) {
  }
  [1]=>
  array(0) {
  }
  ["name"]=>
  array(0) {
  }
      ...
}

有人可以帮助我,如何修复此代码以返回实际内容? 我认为这是我提供的足够信息。

1 个答案:

答案 0 :(得分:2)

我会分四步而不是一步:

<?php
    preg_match_all('|category">([^<]*)</a>|isU', $html, $categories);
    preg_match('|<div class="detName">[^<]*<[^>]*>([^<]*)</a>|isU', $html, $name);
    preg_match('|<a href="(magnet:[^"]*)"|isU', $html, $link);
    preg_match_all('|<td align="right">([0-9]+)</td>|isU', $html, $up_down);
?>