删除所有href'包装器',除了href包含特定值:PHP

时间:2017-03-12 10:00:59

标签: php filter preg-replace

内容:

<a href="http://www.lipsum.com/">Lorem Ipsum</a> is simply dummy text 
of the printing and typesetting industry. 
<a href="http://www.google.com/1111/2222/3333">Lorem Ipsum</a> has been the industrys 
standard dummy text ever since the 1500s, when an unknown printer 
took a <a href="http://gallery.com">galley</a> of type and scrambled 
it to make a type specimen <a href="http://www.google.com/1111/3333/4444">book</a>.

内容包括3“a href”链接

http://www.lipsum.com/
http://www.google.com/1111/2222/3333
http://www.google.com/1111/3333/4444
http://gallery.com/

我想要这个结果:所选的href值仅为href="http://google.com/1111/3333****

Lorem Ipsum is simply dummy text of the printing and typesetting industry. 
Lorem Ipsum has been the industrys standard dummy text ever since the 1500s,
when an unknown printer took a galley of type and scrambled it to make a type 
specimen <a href="http://www.google.com/1111/3333/4444">book</a>.

有人知道怎么做吗?希望你能理解这个问题。提前谢谢。

1 个答案:

答案 0 :(得分:1)

使用正则表达式解析/转换 HTML 内容并不是一个好主意 但是对于您的小片段并考虑到您需要在删除自身时保留链接文本(例如"Lorem Ipsum"),您可以使用以下preg_replace解决方案:

$html = '<a href="http://www.lipsum.com/">Lorem Ipsum</a> is simply dummy text 
of the printing and typesetting industry. 
<a href="http://www.google.com/1111/2222/3333">Lorem Ipsum</a> has been the industrys 
standard dummy text ever since the 1500s, when an unknown printer 
took a <a href="http://gallery.com">galley</a> of type and scrambled 
it to make a type specimen <a href="http://www.google.com/1111/3333/4444">book</a>.';

$re = '/<a href="http:\/\/(?!www\.google\.com\/1111\/3+\/[^>]+).*?>([^<>]+)<\/a>/m';
$result = preg_replace($re, "$1", $html);

echo $result;

输出:

Lorem Ipsum is simply dummy text 
of the printing and typesetting industry. 
Lorem Ipsum has been the industrys 
standard dummy text ever since the 1500s, when an unknown printer 
took a galley of type and scrambled 
it to make a type specimen <a href="http://www.google.com/1111/3333/4444">book</a>.

(?!www\.google\.com\/1111\/3+\/[^>]+) - 前瞻性否定断言,匹配链接,那些href属性值不符合所需要求href="http://www.google.com/1111/3333****

<强> ----------

更准确的方法是使用 DOMDocument / DOMXpath 类:

$dom = new \DOMDocument();
$dom->loadHTML($html);
$xpath = new \DOMXPath($dom);

$nodes = $xpath->query("//a[not(contains(@href, 'http://www.google.com/1111/3333'))]");
foreach ($nodes as $n) {
    $n->parentNode->replaceChild($dom->createTextNode($n->nodeValue), $n);
}

echo $dom->saveHTML($dom->documentElement);

输出:

<html><body>Lorem Ipsum is simply dummy text 
of the printing and typesetting industry. 
Lorem Ipsum has been the industrys 
standard dummy text ever since the 1500s, when an unknown printer 
took a galley of type and scrambled 
it to make a type specimen <a href="http://www.google.com/1111/3333/4444">book</a>.</body></html>