Question

我正在寻找一个正则表达式，它匹配末尾没有点的链接。我知道FQDN的末尾总是有根点，但是我正在从事博客服务。我需要处理博客文章，并且显然有些用户使用链接来结束他们的帖子，然后使用点来结束他们的句子。

这些文本看起来像：

Example text... https://example.com/site. More text here...

这里的问题是它没有链接到任何网页。在this question的帮助下，我完成了以下PHP函数：

function modifyText($text) {
    $url = '/(http|https)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/';
    $string= preg_replace($url, '<a href="$0" target="_blank">$0</a>', $text);
    return $string;
}

使用上面的示例，此代码生成

Example text... <a href="https://example.com/site." target="_blank">https://example.com/site.</a> More text here...

但它应该生成

Example text... <a href="https://example.com/site" target="_blank">https://example.com/site</a>. More text here...

Answer 1

一种选择是在结尾处延迟重复的非空格字符，并提前查找零个或多个.，然后是空格或字符串的结尾：

'/https?:\/\/[a-z0-9.-]+\.[a-z]{2,3}(\/\S*?(?=\.*(?:\s|$)))?/i'

https://regex101.com/r/4VEWjW/2

还可以重复点后面再加上非点，以避免变得懒惰：

'/https?:\/\/[a-z0-9.-]+\.[a-z]{2,3}(\/\.*[^.]+(?=\.*(?:\s|$)))?/i'

Answer 2

另一种选择是在(?<!\.)之后的\S后面使用否定式断言来断言左侧不是点：

https?://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,3}(?:\/\S*(?<!\.))?

Regex demo | Php demo

如果不需要捕获组()，可以将它们变成非捕获组(?:)

如果您使用\/以外的其他delimiter，例如/，则不必逃脱正斜杠~

例如：

function modifyText($text) {
    $url = '~https?://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,3}(?:\/\S*(?<!\.))?~';
    $string= preg_replace($url, '<a href="$0" target="_blank">$0</a>', $text);
    return $string;
}

echo modifyText("Example text... https://example.com/site. More text here... https://example.com/site");

结果

Example text... <a href="https://example.com/site" target="_blank">https://example.com/site</a>. More text here... <a href="https://example.com/site" target="_blank">https://example.com/site</a>

正则表达式，结尾处没有点的链接

2 个答案: