Question

我有以下内容来检测和替换链接：

        // need to find anchors
        Regex urlRx = new Regex(@"((https?|ftp|file)\://|www.)[A-Za-z0-9\.\-]+(/[A-Za-z0-9\?\#\&\=;\+!'\(\)\*\-\._~%]*)*", RegexOptions.IgnoreCase);
        MatchCollection matches = urlRx.Matches(source);
        foreach (Match match in matches)
        {
            source = source.Replace(match.Value, "<a  target=\"_blank\" href='" + match.Value + "'>" + match.Value + "</a>");
        }

但是当source包含一个锚时，这并不是很有效，因为它用另一个锚替换已经存在的锚的内部。我怎样才能防止这种情况发生？

示例i / o：

http://www.google.com   ->   <a target="blank"> href="http://www.google.com">http://www.google.com</a>
Pre-existing anchors (<a></a>) -> unchanged

我认为阻止匹配任何前面带有非空格字符（或引用"）的网址都是有效的，但我不知道该怎么做。

Answer 1

您只需检查是否已有预先存在的锚

        Regex urlRx = new Regex(@"((https?|ftp|file)\://|www.)[A-Za-z0-9\.\-]+(/[A-Za-z0-9\?\#\&\=;\+!'\(\)\*\-\._~%]*)*", RegexOptions.IgnoreCase);
        MatchCollection matches = urlRx.Matches(source);

        var rxAnchor = new Regex("<a [^>]*href=(?:'(?<href>.*?)')|(?:\"(?<href>.*?)\")", RegexOptions.IgnoreCase);

        foreach (Match match in matches)
        {
            List<string> urls = rxAnchor.Matches(source).OfType<Match>().Select(m => m.Groups["href"].Value).ToList();

            if (urls != null && urls.Count() > 0)
            {
                string urlToAppend = urls[0];
                // DO Your Stuff here
            }
            else
            {
                source = source.Replace(match.Value, "<a  target=\"_blank\" href='" + match.Value + "'>" + match.Value + "</a>");
            }
        }

使用RegEx检测URL - 如何防止双重链接？

1 个答案: