PLINQ中的绑定源线程

时间:2014-09-11 21:08:36

标签: c# multithreading linq parallel-processing plinq

我有一个计算,我使用PLINQ进行并行化,如下所示:

  • IEnumerable<T> source提供从a读取的对象 文件

  • 我需要做一个重量级计算HeavyComputation 每个T,我希望这些线程跨越线程,所以我 使用PLINQ,如:AsParallel().Select(HeavyComputation)

这里有趣的地方:由于文件的限制 提供source的读者类型,我需要source 枚举在初始线程上,而不是并行工作者---我需要 对source 绑定的完整评估 线。然而,似乎源实际上是在工人身上列举的 线程。

我的问题是:是否有一种直接的方法来修改此代码 将source的枚举绑定到初始线程,而 将繁重的工作耕种给并行工人?请记住 只是在.ToList()之前做一个热切的AsParallel()不是一个选项, 因为来自文件的数据流很大。

以下是一些示例代码,用于演示我遇到的问题:

using System.Threading;
using System.Collections.Generic;
using System.Linq;
using System;

public class PlinqTest
{
        private static string FormatItems<T>(IEnumerable<T> source)
        {
                return String.Format("[{0}]", String.Join(";", source));
        }

        public static void Main()
        {
            var expectedThreadIds = new[] { Thread.CurrentThread.ManagedThreadId };

            var threadIds = Enumerable.Range(1, 1000)
                    .Select(x => Thread.CurrentThread.ManagedThreadId) // (1)
                    .AsParallel()
                    .WithDegreeOfParallelism(8)
                    .WithExecutionMode(ParallelExecutionMode.ForceParallelism)
                    .AsOrdered()
                    .Select(x => x)                                    // (2)
                    .ToArray();

            // In the computation above, the lambda in (1) is a
            // stand in for the file-reading operation that we
            // want to be bound to the main thread, while the
            // lambda in (2) is a stand-in for the "expensive
            // computation" that we want to be farmed out to the
            // parallel worker threads.  In fact, (1) is being
            // executed on all threads, as can be seen from the
            // output.

            Console.WriteLine("Expected thread IDs: {0}",
                              FormatItems(expectedThreadIds));
            Console.WriteLine("Found thread IDs: {0}",
                              FormatItems(threadIds.Distinct()));
        }
}

我得到的示例输出是:

Expected thread IDs: [1]
Found thread IDs: [7;4;8;6;11;5;10;9]

2 个答案:

答案 0 :(得分:1)

如果您放弃PLINQ并明确使用任务并行库,这是相当简单的(尽管可能不那么简洁):

// Limits the parallelism of the "expensive task"
var semaphore = new SemaphoreSlim(8);

var tasks = Enumerable.Range(1, 1000)
    .Select(x => Thread.CurrentThread.ManagedThreadId)
    .Select(async x =>
    {
        await semaphore.WaitAsync();
        var result = await Task.Run(() => Tuple.Create(x, Thread.CurrentThread.ManagedThreadId));
        semaphore.Release();

        return result;
    });

return Task.WhenAll(tasks).Result;

请注意,我正在使用Tuple.Create来记录来自主线程的线程ID和来自生成任务的线程ID。从我的测试来看,前者对于每个元组总是相同的,而后者是变化的,这是应该的。

信号量确保并行度永远不会超过8(尽管创建元组的成本很低,但这无论如何都不太可能)。如果你到8,任何新的任务都会等到信号量上有可用的点。

答案 1 :(得分:0)

您可以使用下面的 OffloadQueryEnumeration 方法,它确保源序列的枚举将发生在枚举结果 IEnumerable<TResult> 的同一线程上。 querySelector 是将源序列的代理转换为 https://www.confluent.io/kafka-summit-san-francisco-2019/whats-the-time-and-why/ 的委托。此查询在 ThreadPool 线程内部枚举,但输出值会返回到当前线程。

/// <summary>Enumerates the source sequence on the current thread, and enumerates
/// the projected query on a ThreadPool thread.</summary>
public static IEnumerable<TResult> OffloadQueryEnumeration<TSource, TResult>(
    this IEnumerable<TSource> source,
    Func<IEnumerable<TSource>, IEnumerable<TResult>> querySelector)
{
    // Arguments validation omitted
    var locker = new object();
    (TSource Value, bool HasValue) input = default; bool inputCompleted = false;
    (TResult Value, bool HasValue) output = default; bool outputCompleted = false;
    using var sourceEnumerator = source.GetEnumerator();

    IEnumerable<TSource> GetSourceProxy()
    {
        while (true)
        {
            TSource item;
            lock (locker)
            {
                if (!input.HasValue)
                {
                    if (inputCompleted) yield break;
                    Monitor.Wait(locker); continue;
                }
                item = input.Value; input = default;
                Monitor.PulseAll(locker);
            }
            yield return item;
        }
    }

    var query = querySelector(GetSourceProxy());

    var task = Task.Run(() =>
    {
        try
        {
            foreach (var result in query)
            {
                lock (locker)
                {
                    while (output.HasValue) Monitor.Wait(locker);
                    output = (result, true);
                    Monitor.PulseAll(locker);
                }
            }
        }
        finally
        {
            lock (locker) { outputCompleted = true; Monitor.PulseAll(locker); }
        }
    });

    Exception sourceEnumeratorException = null;
    while (true)
    {
        TResult result;
        lock (locker)
        {
            if (output.HasValue)
            {
                result = output.Value; output = default;
                Monitor.PulseAll(locker);
                goto yieldResult;
            }
            if (outputCompleted) break;
            if (input.HasValue || inputCompleted)
            {
                Monitor.Wait(locker); continue;
            }
            try
            {
                if (sourceEnumerator.MoveNext())
                    input = (sourceEnumerator.Current, true);
                else
                    inputCompleted = true;
            }
            catch (Exception ex)
            {
                sourceEnumeratorException = ex;
                inputCompleted = true;
            }
            Monitor.PulseAll(locker); continue;
        }
    yieldResult:
        yield return result;
    }

    task.GetAwaiter().GetResult(); // Propagate possible exceptions
    lock (locker) if (sourceEnumeratorException != null)
        ExceptionDispatchInfo.Capture(sourceEnumeratorException).Throw();
}

此方法使用 ParallelQuery<T>/Monitor.Wait 机制 (Monitor.Pulse),以便同步值从一个线程到另一个线程的传输。

用法示例:

int[] threadIds = Enumerable
    .Range(1, 1000)
    .Select(x => Thread.CurrentThread.ManagedThreadId)
    .OffloadQueryEnumeration(proxy => proxy
        .AsParallel()
        .WithDegreeOfParallelism(8)
        .WithExecutionMode(ParallelExecutionMode.ForceParallelism)
        .AsOrdered()
        .Select(x => x)
    )
    .ToArray();