Question

我们使用以下正则表达式将文本中的URL转换为链接，如果它们太长则会在中间缩短省略号：

/**
 * Replace all links with <a> tags (shortening them if needed)
 */
$match_arr[] = '/((http|ftp)+(s)?:\/\/[^<>\s,!\)]+)/ie';
$replace_arr[] = "'<a href=\"\\0\" title=\"\\0\" target=\"_blank\">' . " .
    "( mb_strlen( '$0' ) > {$maxlength} ? mb_substr( '$0', 0, " . ( $maxlength / 2 ) . " ) . '…' . " .
    "mb_substr( '$0', -" . ( $maxlength / 2 ) . " ) : '$0' ) . " .
"'</a>'";

这很有效。但是，我发现如果文本中已有链接，例如：

$text = '... <a href="http://www.google.com">http://www.google.com</a> ...';

它将匹配两个网址，因此它会尝试再创建两个<a>标记，当然会完全弄乱DOM。

如果链接已在<a>标记内，如何阻止正则表达式匹配？它也会出现在title属性中，所以基本上我只想完全跳过每个<a>标记。

Answer 1

最简单的方法（使用正则表达式，在这种情况下可能不是最可靠的工具）可能是为了确保链接后没有</a>：

#(http|ftp)+(s)?://[^<>\s,!\)]++(?![^<]*</a>)#ie

我正在使用possessive quantifiers来确保匹配整个网址（即没有回溯以满足前瞻）。

正则表达式，用于解析链接的URL，但前提是它们不是链接

1 个答案: