preg_match_all在两个句子之间

时间:2013-09-09 12:15:53

标签: php regex html-parsing

来自短语:

 <div class="latestf"> <a href="http://www.x.ro/anamaria/"
 rel="nofollow"

我想提取anamaria。如何用preg_match_all做到这一点?

我试过了:

preg_match_all("'<div class=\"latestf\">
<a href=\"http://www.x.ro/(.*?)\" rel=\"nofollow\"'si", $source, $match);

但它没有用......

提前谢谢!

3 个答案:

答案 0 :(得分:1)

试试这个:

$source = '<div class="latestf"> <a href="http://www.x.ro/anamaria/" rel="nofollow"';


preg_match_all('#<div\s*class="latestf">\s*<a\s*href="http://www\.x\.ro/(.*?)/?"\s*rel="nofollow"#i', $source, $match);

print_r($match);

Array
(
    [0] => Array
        (
            [0] => <div class="latestf"> <a href="http://www.x.ro/anamaria/" rel="nofollow"
        )

    [1] => Array
        (
            [0] => anamaria
        )

)

答案 1 :(得分:1)

不要尝试使用正则表达式解析HTML。改为使用DOM parser

$html = '<div class="latestf"> <a href="http://www.x.ro/anamaria/"
 rel="nofollow"';

$dom = new DOMDocument;
@$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('a') as $node)
{
    $link = $node->getAttribute("href");
}

$parsed = parse_url($link);

echo substr($parsed['path'], 1, -1);

输出:

anamaria

Demo!

答案 2 :(得分:0)

/应该像\/

一样进行转义
<?php

  $source = '<div class="latestf"> <a href="http://www.x.ro/anamaria/" rel="nofollow"';

  preg_match_all('/<div class="latestf"> <a href="http:\/\/www.x.ro\/(.*?)\/" rel="nofollow"/', $source, $match);

  var_dump($match);exit;