Question

如何排除某个域的href匹配（例如one.com）？

我目前的代码：

$str = 'This string has <a href="http://one.com">one link</a> and <a href="http://two.com">another link</a>';
$str = preg_replace('~<a href="(https?://[^"]+)".*?>.*?</a>~', '$1', $str);
echo $str; // This string has http://one.com and http://two.com

期望的结果：

This string has <a href="http://one.com">one link</a> and http://two.com

Answer 1

使用正则表达式

如果您要使用正则表达式来完成此任务，则可以使用否定前瞻。它基本断言//属性中的href部分不后跟one.com。重要的是要注意一个环绕声断言并不消耗任何字符。

这里是正则表达式的样子：

<a href="(https?://(?!one\.com)[^"]+)".*?>.*?</a>

正则表达式可视化：

Regex101 demo

使用DOM解析器

尽管这是一项非常简单的任务，但实现此目的的正确方法是使用DOM解析器。这样，如果您的标记格式将来发生变化，您就不必更改正则表达式。如果<a>节点包含更多属性值，则正则表达式解决方案将中断。要解决所有这些问题，您可以使用DOM解析器（如PHP的DOMDocument）来处理解析：

以下是解决方案的样子：

$dom = new DOMDocument(); 
$dom->loadHTML($html); // $html is the string containing markup

$links = $dom->getElementsByTagName('a');

//Loop through links and replace them with their anchor text
for ($i = $links->length - 1; $i >= 0; $i--) {
    $node = $links->item($i);

    $text = $node->textContent;
    $href = $node->getAttribute('href');

    if ($href !== 'http://one.com') {
        $newTextNode = $dom->createTextNode($text);
        $node->parentNode->replaceChild($newTextNode, $node);
    }
}

echo $dom->saveHTML();

Live Demo

Answer 2

This应该这样做：

<a href="(https?://(?!one\.com)[^"]+)".*?>.*?</a>

我们使用否定前瞻来确保one.com之后不会直接显示https?://。

如果您还想检查one.com的某些子域名，请使用this example：

<a href="(https?://(?!((www|example)\.)?one\.com)[^"]+)".*?>.*?</a>

我们可以选择在www.之前检查example.或one.com。但是，这将允许像misc.com这样的网址。如果您要删除one.com的所有子域，请使用this：

<a href="(https?://(?!([^.]+\.)?one\.com)[^"]+)".*?>.*?</a>

如何排除特定域的正则表达式href匹配？

2 个答案:

使用正则表达式

使用DOM解析器