Question

我想删除内容文本周围的锚标记，但是如果它与特定网址一起需要保留：

this is example text <a href="www.1.com">hello</a> and
this is second link <a href="www.2.com">hello word two</a>
this is third link <a href="www.3.com">hello word three</a>
this is fourth link <a href="www.4.com">hello word four</a>

我只想从www.1.com和www.2.com删除带有href的锚点，并希望保留其他锚点。目前我正在使用以下代码删除所有锚标记：

preg_replace( '/<a[^>]+>([^<]+)<\/a>/i','\1', $content )

请帮忙。

Answer 1

改用str_replace 用空格替换你的锚标签 http://www.w3schools.com/php/func_string_str_replace.asp

或

$ str =＆＃39; www.1.com＆＃39 ;; echo trim（preg_replace（＆＃39; /＆lt; [^＆gt;] *＆gt; /＆＃39;，＆＃39;＆＃39;，$ str））;

Answer 2

众所周知，正则表达式不是操纵HTML的最安全的工具。

我建议使用DOMDocument解析字符串，查找a属性值包含href或www.1.com的所有www.2.com代码，并仅删除它们：

$html = "<html><head></head><body>TEXTthis is example text <a href=\"www.1.com\">hello</a> and
this is second link <a href=\"www.2.com\">hello word two</a>
this is third link <a href=\"www.3.com\">hello word three</a>
this is fourth link <a href=\"www.4.com\">hello word four</a></body></html>"; 
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED|LIBXML_HTML_NODEFDTD);
$xp = new DOMXPath($dom);
$links = $xp->query('//a[contains(@href,"www.1.com") or contains(@href,"www.2.com")]');
foreach ($links as $link) {
       $link->parentNode->removeChild($link);
}
echo $dom->saveHTML();

请参阅此PHP demo

正则表达式只应被视为最后的手段，尤其是当您无法修复损坏的HTML时。在这种情况下，后备解决方案可以是'~<a\s[^<]*?\bhref="www\.[12]\.com"[^<]*?>[^<]*<\/a>~i'正则表达式，其匹配a个值等于href或www.1.com的{{1}}个标记。或'~<a\s[^<]*?\bhref="[^<"]*?www\.[12]\.com[^<]*?>[^<]*<\/a>~i'如果href可以包含这些域名。

删除内容中文本周围的锚标记

2 个答案: