Question

我的应用正在解析连续传入的痕迹我在外部库中预编译了正则表达式。读取和分析输入数据是在以下函数中完成的，该函数在工作线程中运行
为了演示目的，我已经删除了代码。目前它使用30个不同的正则表达式，按顺序检查。

    private void Filter()
    {
        Regex rgx_1 = new RegEx_1();
        Regex rgx_2 = new RegEx_2();
        ...
        Regex rgx_N = new RegEx_N();

        uint index = 0;
        while (!FilterThread.CancellationPending)
        {
            BufferLength = (int)Source.GetItemCount() - 1;
            if (index <= BufferLength)
            {
                item = (ColorItem)Source.GetItem(index);
                if (item != null)
                {
                    tracecontend = item.GetItemSummary();
                    if (rgx_1.IsMatch(tracecontend))
                    {
                        current_trace = new TraceLine(index, tracecontend, GROUP_1);
                    }
                    else if (rgx_2.IsMatch(tracecontend))
                    {
                        current_trace = new TraceLine(index, tracecontend, GROUP_2);
                    }
                    else if (rgx_3.IsMatch(tracecontend))
                    {
                        current_trace = new TraceLine(index, tracecontend, GROUP_3);
                    }
                    ...
                    else if (rgx_N.IsMatch(tracecontend))
                    {
                        current_trace = new TraceLine(index, tracecontend, GROUP_N);
                    }
                    listBox.Dispatcher.BeginInvoke(DispatcherPriority.Normal, new AddTraceDelegate(AddTrace), current_trace);
                }
                index++;
                System.Threading.Thread.Sleep(1);
            }
        }
    }

通过这种方法，我每秒最多可以处理500条跟踪，这足以实现实时跟踪。但是，读取包含高达2.000.000条痕迹的文件仍需要很长时间。

您是否知道如何加快执行速度并提高吞吐量？

是否有人对这种情况有最佳做法？

编辑：这是一个正则表达式的例子

           compilationList.Add(new RegexCompilationInfo(@"SomeTextToFilterFor(.*?)",
                   RegexOptions.IgnoreCase | RegexOptions.CultureInvariant,
                   "RegEx_1",
                   "Utilities.RegularExpressions",
                   true));
          RegexCompilationInfo[] compilationArray = new RegexCompilationInfo[compilationList.Count];
          AssemblyName assemName = new AssemblyName("RegexLib, Version=1.0.0.1001, Culture=neutral, PublicKeyToken=null");
          compilationList.CopyTo(compilationArray);
          Regex.CompileToAssembly(compilationArray, assemName);

Answer 1

有很多方法可以提高速度。

如果可能，请合并您的正则表达

正则表达式是状态机，有可能回溯但会尝试一次完成所有工作。一气呵成，比许多比赛要好。

例如：

aaaaaab | aaaaaac

慢于此：

aaaaaa(b|c)

当然，如果单独运行它们会慢得多。

优化正则表达式

您可以将 RegexBuddy 用于这些目的。只需键入一些模式和源代码，您就会看到正则表达式的所有回溯和耗时部分。您可以更改模式的结构，或者只在正则表达式中添加 if-clause 来阻止回溯。

例如，当您知道模式的某些部分只能在某些情况下匹配时，您可以过滤案例：

(?(?=/*fast to check condition*/)/*complex regex here*/|/*simple regex here*/)

预编译模式将它们从本地范围中抛出到全局范围（使它们成为静态）并添加 RegexOptions.Compiled 选项。

解析跟踪数据的最佳实践

1 个答案: