Question

我正在努力理解线程和 Parallel.For 之间的区别。我创建了两个函数，一个使用Parallel.For其他调用的线程。调用 10个线程似乎会更快（更快），有人可以解释一下吗？线程会使用系统中可用的多个处理器（以并行执行）还是只是参照CLR进行时间切片？ / p>

public static bool ParallelProcess()
{
    Stopwatch sw = new Stopwatch();

    sw.Start();
    Parallel.For(0, 10, x =>
    {
        Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
            Thread.CurrentThread.ManagedThreadId));
        Thread.Sleep(3000);
    });
    sw.Stop();
    Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));

    return true;
}

public static bool ParallelThread()
{
    Stopwatch sw = new Stopwatch();

    sw.Start();
    for (int i = 0; i < 10; i++)
    {
        Thread t = new Thread(new ThreadStart(Thread1));
        t.Start();
        if (i == 9)
            t.Join();
    }
    sw.Stop();
    Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));

    return true;
}

private static void Thread1()
{
    Console.WriteLine(string.Format("Printing {0} thread = {1}", 0,
           Thread.CurrentThread.ManagedThreadId));
    Thread.Sleep(3000);
}

在以下方法中调用Parallel.For会花费两次时间，然后是线程。

Algo.ParallelThread(); //took 3 secs
Algo.ParallelProcess();  //took 6 secs

Answer 1

Parallel使用了底层调度程序提供的许多线程，这是开始的线程池线程的 minimum 个。

最小个线程池线程的数量默认情况下设置为处理器数量。随着时间的流逝并基于许多不同的因素，例如当前所有线程都处于繁忙状态，调度程序可能会决定生成更多的个线程，并使其超过最小数量。

所有这些都为您管理，以停止不必要的资源使用。您的第二个示例通过手动生成线程来规避所有这些问题。如果您明确设置线程池线程数，例如ThreadPool.SetMinThreads(100, 100)，您甚至会看到Parallel 1也需要3秒，因为它立即具有更多可用线程。

Answer 2

您这里有很多出错的地方。

（1）请勿使用sw.Elapsed.Seconds，该值是int，并且（显然）会截断时间的小数部分。不过，更糟糕的是，如果您要花费61秒才能完成此过程，则会报告1，就像秒针一样。相反，您应该使用sw.Elapsed.TotalSeconds作为double报告，并显示总秒数，无论多少分钟或几小时等等。

（2）Parallel.For使用线程池。这样可以大大减少（甚至消除）创建线程的开销。每次调用new Thread(() => ...)时，您都会分配1MB以上的RAM并消耗宝贵的资源，然后再进行任何处理。

（3）您正在用Thread.Sleep(3000);来人为地加载线程，这意味着您已经浪费了创建大量睡眠线程的实际时间。

（4）Parallel.For默认情况下受CPU内核数的限制。因此，当您运行10个线程时，工作将分为两个步骤-意味着Thread.Sleep(3000);连续运行两次，因此运行了6秒。 new Thread方法一次性运行所有线程，这意味着要花3秒钟多的时间，但是Thread.Sleep(3000);再次浪费了线程的启动时间。

（5）您还正在处理CLR JIT问题。第一次运行代码时，启动成本巨大。让我们更改代码以删除睡眠并正确加入线程：

public static bool ParallelProcess()
{
    Stopwatch sw = new Stopwatch();

    sw.Start();
    Parallel.For(0, 10, x =>
    {
        Console.WriteLine(string.Format("Printing {0} thread = {1}", x, Thread.CurrentThread.ManagedThreadId));
    });
    sw.Stop();
    Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.TotalMilliseconds));

    return true;
}

public static bool ParallelThread()
{
    Stopwatch sw = new Stopwatch();

    sw.Start();
    var threads = Enumerable.Range(0, 10).Select(x => new Thread(new ThreadStart(Thread1))).ToList();
    foreach (var thread in threads) thread.Start();
    foreach (var thread in threads) thread.Join();
    sw.Stop();

    Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.TotalMilliseconds));

    return true;
}

private static void Thread1()
{
    Console.WriteLine(string.Format("Printing {0} thread = {1}", 0, Thread.CurrentThread.ManagedThreadId));  
}

现在，要摆脱CLR / JIT的启动时间，让我们运行如下代码：

ParallelProcess();
ParallelThread();
ParallelProcess();
ParallelThread();
ParallelProcess();
ParallelThread();

我们得到的时间是这样的：

Time in secs 3.8617
Time in secs 4.7719
Time in secs 0.3633
Time in secs 1.6332
Time in secs 0.3551
Time in secs 1.6148

与第二和第三次相比，开始时间要糟糕得多。

结果是运行Parallel.For的速度比调用new Thread快4至5倍。

Answer 3

您的代码段不相同。这是ParallelThread的一个版本，该版本与ParallelProcess相同，但启动了新线程：

public static bool ParallelThread()
{
    Stopwatch sw = new Stopwatch();

    sw.Start();
    var threads = new Thread[10];
    for (int i = 0; i < 10; i++)
    {
        int x = i;
        threads[i] = new Thread(() => Thread1(x));
        threads[i].Start();
    }
    for (int i = 0; i < 10; i++)
    {
        threads[i].Join();
    }
    sw.Stop();
    Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));

    return true;
}

private static void Thread1(int x)
{
    Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
           Thread.CurrentThread.ManagedThreadId));
    Thread.Sleep(3000);
}

在这里，我确保等待所有线程。而且，我确保匹配控制台输出。 OP代码无法执行的操作。

但是，时差仍然存在。

让我告诉您，至少在我的测试中，差异是什么：顺序。在ParallelProcess之前运行ParallelThread，它们都应该花费3秒才能完成（忽略初始运行，由于编译的原因，这将花费更长的时间）。我真的无法解释。

我们可以进一步修改上面的代码以使用ThreadPool，这也确实导致ParallelProcess在3秒内完成（即使我没有修改该版本）。这是我想到的ParallelThread和ThreadPool的版本：

public static bool ParallelThread()
{
    Stopwatch sw = new Stopwatch();

    sw.Start();
    var events = new ManualResetEvent[10];
        for (int i = 0; i < 10; i++)
    {
        int x = i;
        events[x] = new ManualResetEvent(false);
        ThreadPool.QueueUserWorkItem
            (
                _ =>
                {
                    Thread1(x);
                    events[x].Set();
                }
            );
    }
    for (int i = 0; i < 10; i++)
    {
        events[i].WaitOne();
    }
    sw.Stop();
    Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));

    return true;
}

private static void Thread1(int x)
{
    Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
           Thread.CurrentThread.ManagedThreadId));
    Thread.Sleep(3000);
}

注意：我们可以在事件上使用WaitAll，但是在STAThread上失败。

您有Thread.Sleep(3000)，这是我们看到的3秒。这意味着我们并没有真正衡量任何一种方法的开销。

因此，我决定进一步研究，并做到这一点，我提高了一个数量级（从10到100），并删除了Console.WriteLine（无论如何都引入了同步）。

这是我的代码清单：

void Main()
{
    ParallelThread();
    ParallelProcess();
}

public static bool ParallelProcess()
{
    Stopwatch sw = new Stopwatch();

    sw.Start();
    Parallel.For(0, 100, x =>
    {
        /*Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
            Thread.CurrentThread.ManagedThreadId));*/
        Thread.Sleep(3000);
    });
    sw.Stop();
    Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));

    return true;
}

public static bool ParallelThread()
{
    Stopwatch sw = new Stopwatch();

    sw.Start();
    var events = new ManualResetEvent[100];
        for (int i = 0; i < 100; i++)
    {
        int x = i;
        events[x] = new ManualResetEvent(false);
        ThreadPool.QueueUserWorkItem
            (
                _ =>
                {
                    Thread1(x);
                    events[x].Set();
                }
            );
    }
    for (int i = 0; i < 100; i++)
    {
        events[i].WaitOne();
    }
    sw.Stop();
    Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));

    return true;
}

private static void Thread1(int x)
{
    /*Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
           Thread.CurrentThread.ManagedThreadId));*/
    Thread.Sleep(3000);
}

ParallelThread我得到6秒，ParallelProcess我得到9秒。即使在撤消订单后也是如此。这使我更加自信，这是对开销的真实衡量。

添加ThreadPool.SetMinThreads(100, 100);可使ParallelThread和ThreadPool的时间缩短到3秒（请记住此版本使用的是ParallelProcess）。这意味着该开销来自线程池。现在，我可以回到产生新线程的版本（修改为产生100，并带有Console.WriteLine注释）：

public static bool ParallelThread()
{
    Stopwatch sw = new Stopwatch();

    sw.Start();
    var threads = new Thread[100];
    for (int i = 0; i < 100; i++)
    {
        int x = i;
        threads[i] = new Thread(() => Thread1(x));
        threads[i].Start();
    }
    for (int i = 0; i < 100; i++)
    {
        threads[i].Join();
    }
    sw.Stop();
    Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));

    return true;
}

private static void Thread1(int x)
{
    /*Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
           Thread.CurrentThread.ManagedThreadId));*/
    Thread.Sleep(3000);
}

此版本从我那里得到3秒钟的持续时间（这意味着时间开销可以忽略不计，因为正如我之前所说的Thread.Sleep(3000)是3秒钟），但是我想指出，这将留下更多的垃圾来收集而不是使用ThreadPool或Parallel.For。另一方面，使用Parallel.For仍与ThreadPool相关。顺便说一句，如果您想降低其性能，仅减少最小线程数是不够的，那么您也必须降低最大线程数（例如ThreadPool.SetMaxThreads(1, 1);）。

总而言之，请注意Parallel.For更易于使用，更难出错。

调用10个线程似乎更快，有人可以解释吗？

生成线程很快。虽然，这将导致更多的垃圾。另外，请注意，您的测试不是很好。

线程会使用系统中可用的多个处理器（以并行执行）还是只是参照CLR进行时间分片？

是的，他们会的。它们映射到底层操作系统线程，可以被它们抢占，并将根据其亲缘关系在任何内核中运行（请参见ProcessThread.ProcessorAffinity）。需要明确的是，它们不是fibers也不是协程。

Answer 4

用最简单的术语来说，使用Thread类可以保证在操作系统级别创建线程，但是使用Parallel.For CLR会在生成OS级线程之前三思而后行。如果感觉是在操作系统级别创建线程的好时机，请继续进行，否则，它将使用可用的线程池。 TPL旨在通过多核环境进行优化。

线程与并行性能

4 个答案: