Question

我有一个可能包含网址的句子。我需要使用以WWW.开头的大写字母的任何网址，并附加HTTP://。我尝试过以下方法：

    private string ParseUrlInText(string text)
    {
        string currentText = text;

        foreach (string word in currentText.Split(new[] { "\r\n", "\n", " ", "</br>" }, StringSplitOptions.RemoveEmptyEntries))
        {
            string thing;
            if (word.ToLower().StartsWith("www."))
            {
                if (IsAllUpper(word))
                {
                    thing = "HTTP://" + word;

                    currentText = ReplaceFirst(currentText, word, thing);
                }
            }
        }

        return currentText;
    }

    public string ReplaceFirst(string text, string search, string replace)
    {
        int pos = text.IndexOf(search);
        if (pos < 0)
        {
            return text;
        }
        return text.Substring(0, pos) + replace + text.Substring(pos + search.Length);
    }

    private static bool IsAllUpper(string input)
    {
        return input.All(t => !Char.IsLetter(t) || Char.IsUpper(t));
    }

但是，它只使用以下内容将多个HTTP://附加到第一个网址：

WWW.GOOGLE.CO.ZA
  WWW.GOOGLE.CO.ZA WWW.GOOGLE.CO.ZA
  HTTP：// WWW.GOOGLE.CO.ZA
  有很多域（这不应该被解析）

以

HTTP：// WWW.GOOGLE.CO.ZA
  HTTP：// WWW.GOOGLE.CO.ZA HTTP：// WWW.GOOGLE.CO.ZA
  HTTP：// WWW.GOOGLE.CO.ZA
  有很多域（这不应该被解析）

请有人告诉我正确的方法

编辑：我需要保留字符串的格式（空格，换行符等）
Edit2 ：网址可能会附加HTTP://。我已经更新了演示。

Answer 1

您的代码存在问题：您使用的是ReplaceFirst方法，它完全符合其意图：它取代了第一次出现，显然并不总是您想要替换的那种。这就是为什么只有你的第一个WWW.GOOGLE.CO.ZA获得HTTP：//的所有追加。

一种方法是使用StreamReader或其他东西，每次你得到一个新词，你检查它的四个第一个字符是否是＆＃34; WWW。＆＃34;并在读者的这个位置插入字符串＆＃34; HTTP：//＆＃34;。但对于可能更短的东西来说，它的重要性很强......

所以，让我们去Regex！

How to insert characters before a word with Regex

Regex.Replace(input, @"[abc]", "adding_text_before_match$1");

How to match words not starting with another word：

(?<!wont_start_with_that)word_to_match

这导致我们：

private string ParseUrlInText(string text)
{
    return Regex.Replace(text, @"(?<!HTTP://)(WWW\.[A-Za-z0-9_\.]+)",
        @"HTTP://$1");
}

Answer 2

我会选择以下内容：

1）你没有两次处理相同的元素，
2）您替换所有实例一次

private string ParseUrlInText(string text)
{
    string currentText = text;
    var workingText = currentText.Split(new[] { "\r\n", "\n", " ", "</br>" }, 
                          StringSplitOptions.RemoveEmptyEntries).Distinct() // .Distinct() gives us just unique entries!
    foreach (string word in workingText)
    {
        string thing;
        if (word.ToLower().StartsWith("www."))
        {
            if (IsAllUpper(word))
            {
                thing = "HTTP://" + word;

                currentText = currentText.Replace("\r\n" + word, "\r\n" + thing)
                                         .Replace("\n" + word, "\n" + thing)
                                         .Replace(" " + word, " " + thing)
                                         .Replace("</br>" + word, "</br>" + thing)
            }
        }
    }

    return currentText;
}

c＃解析文本中的URL

2 个答案: