Question

鉴于以下来自MSDN：

可以在任何线程上创建正则表达式对象，并在线程之间共享。

我发现，对于性能，使用Regex类时，在线程之间共享ThreadLocal实例会更好 NOT 。

请问有人可以解释为什么线程本地实例的运行速度大约快5倍？

以下是结果（在8核机器上）：

   Using Regex singleton' returns 3000000 and takes 00:00:01.1005695
   Using thread local Regex' returns 3000000 and takes 00:00:00.2243880

源代码：

using System;
using System.Linq;
using System.Threading;
using System.Text.RegularExpressions;
using System.Diagnostics;

namespace ConsoleApplication1
{
    class Program
    {
        static readonly string str = new string('a', 400);
        static readonly Regex re = new Regex("(a{200})(a{200})", RegexOptions.Compiled);

        static void Test(Func<Regex> regexGettingMethod, string methodDesciption)
        {
            Stopwatch sw = new Stopwatch();
            sw.Start();
            var sum = Enumerable.Repeat(str, 1000000).AsParallel().Select(s => regexGettingMethod().Match(s).Groups.Count).Sum();
            sw.Stop();
            Console.WriteLine("'{0}' returns {1} and takes {2}", methodDesciption, sum, sw.Elapsed);
        }

        static void Main(string[] args)
        {
            Test(() => re, "Using Regex singleton");

            var threadLocalRe = new ThreadLocal<Regex>(() => new Regex(re.ToString(), RegexOptions.Compiled));
            Test(() => threadLocalRe.Value, "Using thread local Regex");

            Console.Write("Press any key");
            Console.ReadKey();
        }
    }
}

Answer 1

发布我的调查结果。

让我们来看看ILSpy Regex。它包含对RegexRunner的引用。当Regex对象匹配某些内容时，它会锁定其RegexRunner。如果对同一个Regex对象有另一个并发请求，则会创建另一个临时RegexRunner实例。 RegexRunner价格昂贵。共享Regex对象的线程越多，浪费时间创建临时RegexRunner的机会就越多。希望微软能够解决这个大规模并行时代的问题。

另一件事：Regex类的静态成员将模式字符串作为参数（如Match.IsMatch（input，pattern））也必须在不同线程中匹配相同模式时执行得很糟糕。 Regex维护着RegexRunners的缓存。具有相同模式的两个并发Match.IsMatch（）将尝试使用相同的RegexRunner，并且一个线程将必须创建临时RegexRunner。

感谢Will让我知道你如何处理主题首发者找到答案的问题。

多线程使用正则表达式

1 个答案: