Question

注意：我的问题不在于我的链接没有被替换。但是，它正在 NESTED 。例如，这是评论

some string with www.google.com/blah/blah also something else www.google.com

当第二个字符串替换完成时，第一个字符串的一部分也有效（ www.google.com / blah / blah），因此它将替换该链接两次。

我有一个网络应用，可以让用户发表评论。我正在处理输入字符串并将所有链接转换为html链接格式当我在页面上显示时。原始用户输入字符串保留在DB中，并且没有任何事情发生，因此它不会因处理而损坏。就在我在页面上显示时，我会对它进行操作。

现在，这是我用来用他们的html格式替换所有链接的逻辑

正则表达式所有链接
对于每个匹配，请将原始字符串中的链接替换为html格式版本。
最后显示字符串。

例如：www.google.com在页面上显示之前变为<a href="http://www.google.com" target="_blank">www.google.com</a>。

直到最近，我的一位客户发布了一个内容，其中包含来自同一域的两个链接。

链接是，例如，

www.google.com/images/blahblah
www.google.com

我的问题是，当第二次完成字符串替换（我正在使用StringBuilder.Replace）时，第一个链接也会被替换！

所以，首先，

www.google.com/images/blahblah

变为

<a href="http://www.google.com/images/blahblah" target="_blank">www.google.com/image/blahblah</a>

这很好。但问题出现在第二个字符串替换，因为替换是全局的，它替换已经处理的链接，因此原始（上面）链接变得扭曲，因为它在那里看到 www.google.com

这太乱了，以至于我实际上得到了一个残缺不全的字符串。

我该如何避免这种情况？

Regex.Matches是否为我提供了匹配元素的索引？我无法在任何地方找到它。

最好的处理方式是什么？有什么建议？

我可以通过手动遍历字符串来做到这一点，但是它很长很痛苦，必须有一个很好的方法来做...

编辑按照有人的要求添加额外信息：

我的正则表达式：

    string rPattern = @"(((http|ftp|https):\/\/)|www\.)[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#!]*[\w\-\@?^=%&amp;/~\+#])?";

     Regex rLinks = new Regex(rPattern, RegexOptions.IgnoreCase);
     MatchCollection matches = rLinks.Matches(inputString);

然后我正在使用

foreach(Match match in matches)
{
    if(match.value.StartsWith("www.youtube.com/watch"))
    {
         //logic to embed youtube video - this works fine.
    } 
}

//Here I replace all hyperlinks to their <a href> parts

Answer 1

Regex.Matches返回MatchCollection。 Match.Index正是您要找的。

string pattern = @"(https?://)?(?:www(?:\.\w+)+|(?:\w+\.)+(?:com|org|us|net|...))(/\w*)*"; // your pattern here.
foreach (Match match in Regex.Matches (input, pattern))
{
   // Use match.Index and match.Length;
}

但实际上，你可能正在寻找更像这样的东西：

string originalPost = 
   @"Ooh shiney: www.google.com/images/blahblah
   Look here: www.google.com";

string html = Regex.Replace (
   originalPost, patternString, 
   "<a href='http://$1' target='_blank'>$1</a>");

或者，您可以使用matchEvaluator进行更高级的工作（例如确保我们不添加双重http：//。

string html = Regex.Replace (
   originalPost, patternString, 
   m => 
      string.Format (
         "<a href='{0}{1}' target='_blank'>{1}</a>",
          m.Value.StartsWith ("http", StringComparison.IgnoreCase) ? "" : "http://",
          m.Value));

Answer 2

我有同样的需求，这就是我过去几年一直在使用的东西：

public static string MakeCommentSafe(string strComment)
{
    // Replace carriage return / line feeds with line feeds.  Then HtmlEncode.  Then replace multiple consecutive line feeds with single line feeds.
    strComment = Regex.Replace(System.Web.HttpContext.Current.Server.HtmlEncode(Regex.Replace(strComment, "\r\n", "\n").Replace((char)13, (char)10)), "\n(\n)+", "$1\n");

    // Find all links and make them active
    return Regex.Replace(Regex.Replace(strComment, @"((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)", "<a href=\"$1\" target=\"_blank\" rel=\"nofollow\">$1</a>"), "\n", "<br />");
}

这是一个提示。如果您确实希望这在页面上有很多注释，那么在发布注释时，将不安全版本和安全版本都存储在数据库中。这样，当您在页面上显示每条评论时，您不必重复调用此函数。

Answer 3

使用Regex.Replace方法，例如：

var result = Regex.Replace(input, pattern, "<a href=\"$0\" target=\"_blank\">$0</a>");

Answer 4

扮演恶魔倡导者：

所以，你想要纠正看起来像的字符串：

www.example.com
www.example.com/foo/bar
www.example.co.tw/baz.moo?foo=1

但是，不是看起来像的字符串：

www.example.com www.example.com/foo/bar www.example.co.tw/baz.moo?foo=1

我猜我是对的。简单的解决方案，扩展你的正则表达式，看看看起来像URL的东西的任何一面，并忽略它：

介于href="和" target="_blank">
介于" target="_blank">和</a>

如何解决字符串替换fiasco

4 个答案: