如何并行处理项目然后合并结果?

时间:2012-11-21 16:10:28

标签: c# concurrency merge task-parallel-library tpl-dataflow

我遇到了以下问题:

我有一个Foo个对象的数据流,并将这些对象流式传输到几个并发的进程内任务/线程,这些任务/线程又处理对象并输出FooResult个对象。每个FooResult在其他成员中包含与创建Foo时使用的FooResult相同的Foo。但是,并非每个FooResult都必须创建Foo

我的问题是,我希望从整个过程传递一个包装对象,该对象包含原始FooResult以及可能从{{Foo创建的所有ActionBlock<Foo>个对象。 1}}在并发任务中。

注意:我目前使用TPL数据流,而每个并发进程都在BroadCastBlock<Foo>内发生,该SendAsync()链接到FooResult。它使用FooResult到目标数据流块来发送可能已创建的FooResult。显然,并发数据流块在不可预测的时间产生ActionBlock<Foo>,这是我目前正在努力解决的问题。我似乎无法弄清楚在所有Foo中一起创建了多少BroadCastBlock<Foo> broadCastBlock; ActionBlock<Foo> aBlock1; ActionBlock<Foo> aBlock2; ActionBlock<FooResult> targetBlock; broadCastBlock.LinkTo(aBlock1); broadCastBlock.LinkTo(aBlock2); aBlock1 = new ActionBlock<Foo>(foo => { //do something here. Sometimes create a FooResult. If then targetBlock.SendAsync(fooResult); }); //similar for aBlock2 ,以便我可以将它们与原始Foo捆绑在一起并将其作为包装对象传递。

在伪代码中,它目前看起来如下:

FooResult

但是,当前代码的问题在于,如果FooResult未在任何操作块中生成单个FooResult,则targetBlock可能不会收到任何内容。此外,它可能是targetBlock接收2 Foo个对象,因为每个动作块产生FooResult

我想要的是targetBlock接收包含每个FooResult的包装对象,如果创建了public class Test { private BroadcastBlock<int> broadCastBlock; private TransformBlock<int, int> transformBlock1; private TransformBlock<int, int> transformBlock2; private JoinBlock<int, int, int> joinBlock; private ActionBlock<Tuple<int, int, int>> processorBlock; public Test() { broadCastBlock = new BroadcastBlock<int>(i => { return i; }); transformBlock1 = new TransformBlock<int, int>(i => { return i; }, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded }); transformBlock2 = new TransformBlock<int, int>(i => { return i; }, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded }); joinBlock = new JoinBlock<int, int, int>(); processorBlock = new ActionBlock<Tuple<int, int, int>>(tuple => { //Console.WriteLine("original value: " + tuple.Item1 + "tfb1: " + tuple.Item2 + "tfb2: " + tuple.Item3); }, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded }); //Linking broadCastBlock.LinkTo(transformBlock1, new DataflowLinkOptions { PropagateCompletion = true }); broadCastBlock.LinkTo(transformBlock2, new DataflowLinkOptions { PropagateCompletion = true }); broadCastBlock.LinkTo(joinBlock.Target1, new DataflowLinkOptions { PropagateCompletion = true }); transformBlock1.LinkTo(joinBlock.Target2, new DataflowLinkOptions { PropagateCompletion = true }); transformBlock2.LinkTo(joinBlock.Target3, new DataflowLinkOptions { PropagateCompletion = true }); joinBlock.LinkTo(processorBlock, new DataflowLinkOptions { PropagateCompletion = true }); } public void Start() { Stopwatch watch = new Stopwatch(); watch.Start(); const int numElements = 1000000; for (int i = 1; i <= numElements; i++) { broadCastBlock.Post(i); } ////mark completion broadCastBlock.Complete(); processorBlock.Completion.Wait(); watch.Stop(); Console.WriteLine("Time it took: " + watch.ElapsedMilliseconds + " - items processed per second: " + numElements / watch.ElapsedMilliseconds * 1000); Console.ReadLine(); } } 个对象,那么还会收集public Test() { broadCastBlock = new BroadcastBlock<int>(i => { return i; }); transformBlock1 = new TransformBlock<int, int>(i => { return i; }); transformBlock2 = new TransformBlock<int, int>(i => { return i; }); joinBlock = new JoinBlock<int, int>(); processorBlock = new ActionBlock<Tuple<int, int>>(tuple => { //Console.WriteLine("tfb1: " + tuple.Item1 + "tfb2: " + tuple.Item2); }); //Linking broadCastBlock.LinkTo(transformBlock1, new DataflowLinkOptions { PropagateCompletion = true }); broadCastBlock.LinkTo(transformBlock2, new DataflowLinkOptions { PropagateCompletion = true }); transformBlock1.LinkTo(joinBlock.Target1); transformBlock2.LinkTo(joinBlock.Target2); joinBlock.LinkTo(processorBlock, new DataflowLinkOptions { PropagateCompletion = true }); } public void Start() { Stopwatch watch = new Stopwatch(); watch.Start(); const int numElements = 1000000; for (int i = 1; i <= numElements; i++) { broadCastBlock.Post(i); } ////mark completion broadCastBlock.Complete(); Task.WhenAll(transformBlock1.Completion, transformBlock2.Completion).ContinueWith(_ => joinBlock.Complete()); processorBlock.Completion.Wait(); watch.Stop(); Console.WriteLine("Time it took: " + watch.ElapsedMilliseconds + " - items processed per second: " + numElements / watch.ElapsedMilliseconds * 1000); Console.ReadLine(); } }

我能做些什么来使解决方案按照描述的方式工作?它不必仔细阅读TPL数据流,但如果确实如此,它会很简洁。

更新:以下是我通过svick建议的JoinBlock实现获得的内容。我不会使用它(除非它可以在性能方面进行调整),因为它的运行速度非常慢,每秒我得到大约89000个项目(这只是int值类型)。

{{1}}

更新代码以反映建议:

{{1}}

2 个答案:

答案 0 :(得分:1)

据我了解这个问题:

lock foo
work on foo
if foo has not triggered sending a result
and fooResult exists
   send fooResult
   remember in foo that result has already been sent
unlock foo

OP评论后更新

所以将foo推入你的BroadCastBlock

BroadCastBlock<Foo> bcb = new BroadCastBlock<Foo>(foo);
...

if ( aBlock1.HasResult ) 
{
    bcb.Add( aBlock1.Result );
}

if ( aBlock2.HasResult ) 
{
    bcb.Add( aBlock2.Result );
}

现在您可以查询bcb以查找存在的内容并发送所需内容(或者只发送bcb)。

更新(在评论中进行更多讨论后)

class NotificationWrapper<TSource, TResult>
{
   private readonly TSource originalSource;

   private Queue<TResult> resultsGenerated = new Queue<TResult>()

   private int workerCount = 0;

   public NotificationWrapper<TSource, TResult>( TSource originalSource, int workerCount )
   {
       this.originalSource = originalSource;
       this.workerCount = workerCount;
   }

   public void NotifyActionDone()
   {
       lock( this )
       {
          --workerCount;
          if ( 0 == workerCount )
          {
             //do my sending
             send( originalSource, resultsGenerated );
          }
       }
   }

    public void NotifyActionDone( TResult result )
    {
        lock ( this )
        {
            resultsGenerated.push( result );
            NotifyActionDone();
        }
    }
}

在调用代码中:

NotificationWrapper<Foo, Fooresult> notificationWrapper = new NotificationWrapper<Foo, Fooresult>( foo, 2 );
ActionBlock<Foo> ab1 = new ActionBlock<Foo>( foo, notificationWrapper );
ActionBlock<Foo> ab2 = new ActionBlock<Foo>( foo, notificationWrapper );

一旦完成计算,就需要将ActionBlock更改为调用NotifyActionDone()NotifyActoinDone( Fooresult )

答案 1 :(得分:1)

我可以看到两种方法来解决这个问题:

  1. 使用JoinBlock。您的广播块和两个工作块将分别发送到连接块的一个目标。如果一个工人块没有任何结果,它将改为null(或其他一些特殊值)。您的工作区块需要更改为TranformBlock<Foo, FooResult>,因为使用ActionBlock方式并不保证排序(至少在您设置MaxDegreeOfParallelism时没有),TransformBlock

    JoinBlock的结果为Tuple<Foo, FooResult, FooResult>,其中FooResult中的任何一个或两个都可以是null

    虽然我不确定我喜欢这个解决方案在很大程度上依赖于物品的正确排序,这对我来说似乎很脆弱。

  2. 使用其他对象进行同步。然后,当所有块都使用某个项目完成时,该对象将负责向前发送结果。这类似于马里奥在回答中提出的NotificationWrapper

    在这种情况下,您可以使用TaskCompletionSourceTask.WhenAll()来处理同步。