使用php中的正则表达式从锚标记值中获取域名

时间:2014-12-26 16:03:21

标签: php regex

我在php字符串中混合了以下内容。

<div class="biz-website">
    <span class="offscreen">Business website</span>
    <a target="_blank" href="/biz_redir?url=http%3A%2F%2Fwww.example.com&amp;src_bizid=LihgJyPNjlUB3euiFvfEgw&amp;cachebuster=1419609400&amp;s=112daf4cc534d37cbf02a548cb8cb1d15bbeba6fab83b74b1195640dc44c040e">example.com</a>
</div>

我需要从上面的php字符串中获取 example.com 。关于我做错了什么想法?

1 个答案:

答案 0 :(得分:2)

正则表达式不是正确的工具。 正确的工具是一个DOM解析器。我喜欢PHP的DOMDocument

$html = <<<END
<div class="biz-website">
    <span class="offscreen">Business website</span>
    <a target="_blank" href="/biz_redir?url=http%3A%2F%2Fwww.example.com&amp;src_bizid=LihgJyPNjlUB3euiFvfEgw&amp;cachebuster=1419609400&amp;s=112daf4cc534d37cbf02a548cb8cb1d15bbeba6fab83b74b1195640dc44c040e">example.com</a>
</div>
END;

$DOM = new DOMDocument;
$DOM->loadHTML($html);

$aTags = $DOM->getElementsByTagName('a');

$value = $aTags->item(0)->nodeValue;
echo $value;

更新:如果您想查看href是否包含"biz_redir",那么您只需检查:

$html = <<<END
<div class="biz-website">
    <span class="offscreen">Business website</span>
    <a target="_blank" href="/biz_redir?url=http%3A%2F%2Fwww.example.com&amp;src_bizid=LihgJyPNjlUB3euiFvfEgw&amp;cachebuster=1419609400&amp;s=112daf4cc534d37cbf02a548cb8cb1d15bbeba6fab83b74b1195640dc44c040e">example.com</a>
</div>
END;

$DOM = new DOMDocument;
$DOM->loadHTML($html);

$aTags = $DOM->getElementsByTagName('a');
$aTag = $aTags->item(0);

if(strpos($aTag->getAttribute('href'), 'biz_redir') !== FALSE){
    $value = $aTag->nodeValue;
    echo $value;
}

更新2:如果您不只是拥有该剪辑,而是整个网页,那么您可以找到您想要的<div>

$DOM = new DOMDocument;
$DOM->loadHTML($html);
$xPath = new DOMXPath($DOM);

$biz = $xPath->query('//div[@class="biz-website"]/a[contains(@href, "biz_redir")]');

$value = $biz->item(0)->nodeValue;
echo $value;