Question

我编写了一个打开特定目录的C＃程序。然后它打开该目录中的每个文件，并计算以下正则表达式@＆＃34; ^ CLM＆＃34;的每次出现。程序从每个文件返回正则表达式计数，并将该计数放入电子表格中的单独单元格中。我正在使用的代码如下：

List<string> linesPost = System.IO.File.ReadAllLines(diPostFiles + curPostFile).ToList();
int y = 0;
for (int i = linesPost.Count - 1; i >= 0; i--)
{
     string pattern = @"^CLM";
     Match m = Regex.Match(linesPost[i], pattern);
     while (m.Success)
     {
         y++;
         break;
     }
     (xlRange.Cells[startRow + x, 3] as Excel.Range).Value2 = y;
}

这样做有效，但需要很长时间。例如，如果我在Notepad ++中打开一个给定的文件，并输入相同的正则表达式，然后点击计数按钮，我会很快得到结果。

是否有更有效的方法来计算正则表达式的实例？我预计每个文本文件大约发生5,000次。每个文本文件的总大小约为5 MB。

非常感谢任何帮助。

Answer 1

首先，您不需要任何正则表达式。您只是检查每行是否以CLM开头。

而不是

string pattern = @"^CLM";
Match m = Regex.Match(linesPost[i], pattern);
while (m.Success)
{
   y++;
   break;
}

您可以使用

if (linesPost[i].StartsWith("CLM"))
    y++;

如果指定CLM变量，请尝试在循环之前分配它，如果它在循环结束之前没有变化。

另外，您有一行指的是与Excel互操作的早期绑定。我建议使用后期绑定或dynamic类型来处理Excel对象，并在循环后执行。现在，您在循环中访问它，可能需要很长时间。在循环之前添加List<string>变量，收集值，然后在收集完所有内容后插入Excel。

Answer 2

如果您想要速度，请将整个文件读入字符串变量然后运行正则表达式，如下所示。

由于两个原因，这是最快的方法这些线是连续的，不是分成阵列 2.正则表达式引擎代码保持在最低级别，直到找到匹配为止（即它将返回一个匹配，可能与最后一个相距数百行）

注意 - 你确实说过速度。如果你不想要速度，那就不要用这种方式。

int y = 0;
string allLines = @"read the whole file into 'string'";
Regex RxCounter = new Regex(@"(?m)^CLM");    // Unsing (?m) multi-line modifier option, inline.
                                             // If Dot-Net does not recognise this inline option
                                             // set it in the options field of the constructor.
Match _m = RxCounter.Match( allLines );
while (_m.Success)
{
    y++;
    (xlRange.Cells[startRow + x, 3] as Excel.Range).Value2 = y;
    _m = _m.NextMatch();
}

Answer 3

你可以在循环之外编译正则表达式（var r = new Regex(pattern, ...)）并将其应用到内部（r.Match(...)）......仅此一点就可以给你一些加速，因为它不需要一遍又一遍地编译。

如何以编程方式快速计算正则表达式的出现次数与文本编辑器

3 个答案: