在for循环中优化正则表达式替换?

时间:2012-12-15 15:29:58

标签: c# regex optimization replace

这基本上是我previous question的后续行动。我一直在使用这段代码来替换数组中包含的字符串:

string[] replacements = {"these",
                         "words",
                         "will",
                         "get",
                         "replaced"};

string newString = "Hello.replacedthesewordswillgetreplacedreplaced";

for (int j = 0; j < replacements.Length; j++)
{
    newString = Regex.Replace(newBase,
    @"((?<firstMatch>(" + replacements[j] + @"))(\k<firstMatch>)*)",
    m => "[" + j + "," + (m.Groups[3].Captures.Count + 1) + "]");
}

运行此代码newString后将是:

  

您好。[4,1] [0,1] [1,1] [2,1] [3,1] [4,2]

这适用于像上面这样的小型替代品。它基本上可以立即替换字符串 - 但是对于大量的替换,它往往会减慢速度。

任何人都可以看到我可以优化它的方式,以便更快地取代它吗?

我假设for循环正在减慢它的速度。数组中总是包含一些不需要替换的字符串(因为它们不包含在主newString字符串中)所以我想知道是否有办法在for循环之前检查它。虽然这可能会变慢......

我想不出更好的方法,所以我想我会问。谢谢你的帮助! :)

1 个答案:

答案 0 :(得分:1)

有两种方法可以尝试(NB都是未经测试的,但我相信它们应该比你当前的代码更有效并且更快。)

使用静态编译的正则表达式:

private static readonly Dictionary<string, int> Indexes = new Dictionary<string, int> 
{
  { "these", 0 },
  { "words", 1 },
  { "will", 2 },
  { "be", 3 },
  { "replaced", 4 },
};

private static readonly Regex ReplacementRegex = new Regex(string.Join("|", Indexes.Keys), RegexOptions.Compiled)

...
var occurrences = Indexes.Keys.ToDictionary(k => k, k => 0);
return ReplacementRegex.Replace(newString, m => {
  var count = occurences[m.Value];
  occurences[m.Value] = count + 1;
  return "[" + Indexes[m.Value] + "," + count + "]";
});    

没有正则表达式:

for (int j = 0; j < replacements.Length; j++)
{
  var index = 0;
  var count = 0;
  var replacement = replacements[j];
  while((index = newString.IndexOf(replacement, index)) > -1) 
  {
    count++;
    newString = newString.Substring(0, index) + "[" + j + "," + count + "]" + newString.Substring(index + replacement.Length);
  }
}