如何只获得与preg_match的链接

时间:2014-01-07 10:06:21

标签: url preg-match preg-match-all

来源

<div class=filmPoster-1><a class="fImg1 entityPoster" href="/Zielona.Mila" title="Zielona mila (1999)"> bla bla bla bla
<div class=filmPoster-1><a class="fImg1 entityPoster" href="/Batman" title="Batman (1999)">

如何只使用preg_match获取“/Zielona.Mila,/Batman”(此链接)?

1 个答案:

答案 0 :(得分:0)

DOM方式(更合适):<​​/ p>

$dom = new DOMDocument();
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$hrefNodes = $xpath->query('//div[@class="filmPoster-1"]/a[contains(@class, "fImg1") and contains(@class, "entityPoster")]/@href');

foreach($hrefNodes as $hrefNode) {
    $links[] = $hrefNode->textContent;
}
print_r($links);

正则表达方式:

$pattern = <<<'LOD'
~
<div\b
(?>              # possible content before the class attribute
    [^c>]++      # all that is not a "c" or a ">"
  |              # OR
    \Bc          # a "c" not preceded by a word boundary
  |              # OR
    c(?!lass\b)  # "c" not followed by "lass"
)++
class \s*+ = \s*+ ["']?  # the class attribute
(?-i) filmPoster-1 (?i) (?=["'\s>])
[^>]*+ > # and of the div tag
\s*+
<a\b
(?>
    [^>h]++
  |
    \Bh
  |
    h(?!ref\b)
)+
href \s*+ = \s*+ ["\']?
\K            # reset all that have been matched before from match result
[^\s>"\']++
~xi
LOD;

preg_match_all($pattern, $html, $links);
print_r($links);