Question

内容：

<a href="http://www.lipsum.com/">Lorem Ipsum</a> is simply dummy text 
of the printing and typesetting industry. 
<a href="http://www.google.com/1111/2222/3333">Lorem Ipsum</a> has been the industrys 
standard dummy text ever since the 1500s, when an unknown printer 
took a <a href="http://gallery.com">galley</a> of type and scrambled 
it to make a type specimen <a href="http://www.google.com/1111/3333/4444">book</a>.

内容包括3“a href”链接

http://www.lipsum.com/
http://www.google.com/1111/2222/3333
http://www.google.com/1111/3333/4444
http://gallery.com/

我想要这个结果：所选的href值仅为href="http://google.com/1111/3333****

Lorem Ipsum is simply dummy text of the printing and typesetting industry. 
Lorem Ipsum has been the industrys standard dummy text ever since the 1500s,
when an unknown printer took a galley of type and scrambled it to make a type 
specimen <a href="http://www.google.com/1111/3333/4444">book</a>.

有人知道怎么做吗？希望你能理解这个问题。提前谢谢。

Answer 1

使用正则表达式解析/转换 HTML 内容并不是一个好主意但是对于您的小片段并考虑到您需要在删除自身时保留链接文本（例如"Lorem Ipsum"），您可以使用以下preg_replace解决方案：

$html = '<a href="http://www.lipsum.com/">Lorem Ipsum</a> is simply dummy text 
of the printing and typesetting industry. 
<a href="http://www.google.com/1111/2222/3333">Lorem Ipsum</a> has been the industrys 
standard dummy text ever since the 1500s, when an unknown printer 
took a <a href="http://gallery.com">galley</a> of type and scrambled 
it to make a type specimen <a href="http://www.google.com/1111/3333/4444">book</a>.';

$re = '/<a href="http:\/\/(?!www\.google\.com\/1111\/3+\/[^>]+).*?>([^<>]+)<\/a>/m';
$result = preg_replace($re, "$1", $html);

echo $result;

输出：

Lorem Ipsum is simply dummy text 
of the printing and typesetting industry. 
Lorem Ipsum has been the industrys 
standard dummy text ever since the 1500s, when an unknown printer 
took a galley of type and scrambled 
it to make a type specimen <a href="http://www.google.com/1111/3333/4444">book</a>.

(?!www\.google\.com\/1111\/3+\/[^>]+) - 前瞻性否定断言，匹配链接，那些href属性值不符合所需要求href="http://www.google.com/1111/3333****

<强> ----------

更准确的方法是使用 DOMDocument / DOMXpath 类：

$dom = new \DOMDocument();
$dom->loadHTML($html);
$xpath = new \DOMXPath($dom);

$nodes = $xpath->query("//a[not(contains(@href, 'http://www.google.com/1111/3333'))]");
foreach ($nodes as $n) {
    $n->parentNode->replaceChild($dom->createTextNode($n->nodeValue), $n);
}

echo $dom->saveHTML($dom->documentElement);

输出：

<html><body>Lorem Ipsum is simply dummy text 
of the printing and typesetting industry. 
Lorem Ipsum has been the industrys 
standard dummy text ever since the 1500s, when an unknown printer 
took a galley of type and scrambled 
it to make a type specimen <a href="http://www.google.com/1111/3333/4444">book</a>.</body></html>

删除所有href'包装器'，除了href包含特定值：PHP

1 个答案: