TPL数据流,BroadcastBlock到BatchBlocks

时间:2015-04-25 05:04:58

标签: c# concurrency task-parallel-library tpl-dataflow

我在将BroadcastBlock(s)连接到BatchBlocks时遇到问题。方案是来源为BroadcastBlocks,收件人为BatchBlocks

在下面的简化代码中,只执行一个补充操作块。我甚至将每个BatchBlock的batchSize设置为 1 以说明问题。

将贪婪设定为" true"会使2 ActionBlocks执行,但这不是我想要的,因为它会导致BatchBlock继续进行,即使它还没有完成。有什么想法吗?

class Program
{
    static void Main(string[] args)
    {
        // My possible sources are BroadcastBlocks. Could be more
        var source1 = new BroadcastBlock<int>(z => z);

        // batch 1
        // can be many potential sources, one for now
        // I want all sources to arrive first before proceeding
        var batch1 = new BatchBlock<int>(1, new GroupingDataflowBlockOptions() { Greedy = false }); 
        var batch1Action = new ActionBlock<int[]>(arr =>
        {
            // this does not run sometimes
            Console.WriteLine("Received from batch 1 block!");
            foreach (var item in arr)
            {
                Console.WriteLine("Received {0}", item);
            }
        });

        batch1.LinkTo(batch1Action, new DataflowLinkOptions() { PropagateCompletion = true });

        // batch 2
        // can be many potential sources, one for now
        // I want all sources to arrive first before proceeding
        var batch2 = new BatchBlock<int>(1, new GroupingDataflowBlockOptions() { Greedy = false  });
        var batch2Action = new ActionBlock<int[]>(arr =>
        {
            // this does not run sometimes
            Console.WriteLine("Received from batch 2 block!");
            foreach (var item in arr)
            {
                Console.WriteLine("Received {0}", item);
            }
        });
        batch2.LinkTo(batch2Action, new DataflowLinkOptions() { PropagateCompletion = true });

        // connect source(s)
        source1.LinkTo(batch1, new DataflowLinkOptions() { PropagateCompletion = true });
        source1.LinkTo(batch2, new DataflowLinkOptions() { PropagateCompletion = true });

        // fire
        source1.SendAsync(3);

        Task.WaitAll(new Task[] { batch1Action.Completion, batch2Action.Completion }); ;

        Console.ReadLine();
    }
}

2 个答案:

答案 0 :(得分:0)

你完全错误地理解Greedy标志的作用。如果它等于true,即使没有足够数量的数据要收集到批处理中,您的批处理块也会收集数据。通过设置Greedy = false,所以对TPL DataflowI will post to batch blocks, not you说,所以批量块可能会也可能不会决定从广播块中获取消息。

此外,您通过调用Task.WaitAll(new Task[] { batch1Action.Completion, batch2Action.Completion });来阻止线程,因为它将阻止每个Completion任务的主线程线程。这可能导致死锁,因为线程在之前被阻止他们能够在整个管道中发布消息。此外,您不会拨打source1.Complete(),因此此WaitAll电话将永远不会返回

您真正需要的是将Greedy设置为true(默认设置),将批量大小设置为所需值(例如2),调用{{1}方法,不要使用线程阻塞方法为您的管道。通过执行此操作,您的批处理块将从广播中获取所有数据,但是在获取批处理的所有数据之前,更多块将无法获取任何数据:

Complete()

答案 1 :(得分:0)

TPL数据流库的内部机制似乎存在缺陷,该机制支持非贪婪功能。发生的情况是,配置为非贪婪的BatchBlockPostpone从链接块提供的所有消息,而不是接受它们。它会将已延迟的消息保留在内部队列中,当它们的数量达到其BatchSize配置时,它将尝试使用已延迟的消息,如果成功,则将其按预期方式传播到下游。问题在于,BroadcastBlockBufferBlock之类的源块将停止向已延迟先前提供的消息的块提供更多消息,直到它消耗了该单个消息为止。这两种行为的组合导致死锁。无法前进,因为BatchBlock在使用被推迟的消息之前等待提供更多的消息,而BroadcastBlock在提供更多的消息之前等待等待的消息被使用...

仅当BatchSize大于1(此块的典型配置)时,才会发生这种情况。

Here演示了此问题。作为源,使用更常见的BufferBlock而不是BroadcastBlock。 10条消息被发布到一个三块的管道中,并且预期的行为是消息通过管道流到最后一块。实际上,什么也没有发生,所有消息都停留在第一块中。

using System;
using System.Threading;
using System.Threading.Tasks.Dataflow;

public static class Program
{
    static void Main(string[] args)
    {
        var bufferBlock = new BufferBlock<int>();

        var batchBlock = new BatchBlock<int>(batchSize: 2,
            new GroupingDataflowBlockOptions() { Greedy = false });

        var actionBlock = new ActionBlock<int[]>(batch =>
            Console.WriteLine($"Received: {String.Join(", ", batch)}"));

        bufferBlock.LinkTo(batchBlock,
            new DataflowLinkOptions() { PropagateCompletion = true });

        batchBlock.LinkTo(actionBlock,
            new DataflowLinkOptions() { PropagateCompletion = true });

        for (int i = 1; i <= 10; i++)
        {
            var accepted = bufferBlock.Post(i);
            Console.WriteLine(
                $"bufferBlock.Post({i}) {(accepted ? "accepted" : "rejected")}");
            Thread.Sleep(100);
        }

        bufferBlock.Complete();
        actionBlock.Completion.Wait(millisecondsTimeout: 1000);
        Console.WriteLine();
        Console.WriteLine($"bufferBlock.Completion: {bufferBlock.Completion.Status}");
        Console.WriteLine($"batchBlock.Completion:  {batchBlock.Completion.Status}");
        Console.WriteLine($"actionBlock.Completion: {actionBlock.Completion.Status}");
        Console.WriteLine($"bufferBlock.Count: {bufferBlock.Count}");
    }
}

输出:

bufferBlock.Post(1) accepted
bufferBlock.Post(2) accepted
bufferBlock.Post(3) accepted
bufferBlock.Post(4) accepted
bufferBlock.Post(5) accepted
bufferBlock.Post(6) accepted
bufferBlock.Post(7) accepted
bufferBlock.Post(8) accepted
bufferBlock.Post(9) accepted
bufferBlock.Post(10) accepted

bufferBlock.Completion: WaitingForActivation
batchBlock.Completion:  WaitingForActivation
actionBlock.Completion: WaitingForActivation
bufferBlock.Count: 10

我的猜测是,内部offer-consume-reserve-release机制已经过调整,可以最大程度地提高支持BoundedCapacity功能的效率。对于许多应用程序来说都是至关重要的,很少使用的Greedy = false功能未经严格测试。

好消息是,对于您而言,您实际上不需要将Greedy设置为false。处于默认贪婪模式的BatchBlock不会传播少于配置的BatchSize的消息,除非已将其标记为已完成并且会传播所有剩余的消息,或者您手动调用其TriggerBatch方法在任何任意时刻。非贪婪配置的预期用途是防止complex graph scenarios中的资源匮乏,并且块之间存在多个依赖关系。