适用于TPL数据流的BroadcastCopyBlock,保证交付

时间:2016-11-15 08:38:36

标签: c# task-parallel-library broadcast tpl-dataflow

我很高兴在TPL数据流中对BroadcastCopyBlock的以下实现提供一些意见,该数据流将收到的消息复制到所有消费者,注册到BroadcastCopyBlock并保证向所有消费者提供,它们在收到消息时链接到块。 (与不保证邮件传递的BroadcastBlock不同,如果下一个邮件进入,则在将前一条邮件传递给所有消费者之前进行。)

我主要担心的是保留信息和发布预订。如果接收块决定不处理该消息,会发生什么?我的理解是,这会产生内存泄漏,因为消息将无限期地保留。我在想,我应该以某种方式将消息标记为未使用,但我不确定,如何。我正在考虑一些人为的消息接收器(没有动作的ActionBlock),或者我可以将消息标记为丢弃?

对实施的进一步意见也表示赞赏。

这可能几乎与以下问题重复,但我更喜欢使用自己的类,而不是创建块的方法。或者那被认为是不好的风格?
BroadcastBlock with Guaranteed Delivery in TPL Dataflow

/// <summary>
/// Broadcasts the same message to multiple consumers. This does NOT clone the message, all consumers receive an identical message
/// </summary>
/// <typeparam name="T"></typeparam>
public class BrodcastCopyBlock<T> : IPropagatorBlock<T, T>
{
    private ITargetBlock<T> In { get; }

    /// <summary>
    /// Holds a TransformBlock for each target, that subscribed to this block
    /// </summary>
    private readonly IDictionary<ITargetBlock<T>, TransformBlock<T, T>> _OutBlocks = new Dictionary<ITargetBlock<T>, TransformBlock<T, T>>();


    public BrodcastCopyBlock()
    {
        In = new ActionBlock<T>(message => Process(message));

        In.Completion.ContinueWith(task =>
                                   {
                                       if (task.Exception == null)
                                           Complete();
                                       else
                                           Fault(task.Exception);
                                   }
          );
    }

    /// <summary>
    /// Creates a transform source block for the passed target.
    /// </summary>
    /// <param name="target"></param>
    private void CreateOutBlock(ITargetBlock<T> target)
    {
        if (_OutBlocks.ContainsKey(target))
            return;

        var outBlock = new TransformBlock<T, T>(e => e);
        _OutBlocks[target] = outBlock;
    }

    private void Process(T message)
    {
        foreach (var outBlock in _OutBlocks.Values)
        {
            outBlock.Post(message);
        }
    }

    /// <inheritdoc />
    public DataflowMessageStatus OfferMessage(DataflowMessageHeader messageHeader, T messageValue, ISourceBlock<T> source, bool consumeToAccept)
    {
        return In.OfferMessage(messageHeader, messageValue, source, consumeToAccept);
    }

    /// <inheritdoc />
    public void Complete()
    {
        foreach (var outBlock in _OutBlocks.Values)
        {
            ((ISourceBlock<T>)outBlock).Complete();
        }
    }

    /// <inheritdoc />
    public void Fault(Exception exception)
    {
        foreach (var outBlock in _OutBlocks.Values)
        {
            ((ISourceBlock<T>)outBlock).Fault(exception);
        }
    }

    /// <inheritdoc />
    public Task Completion => Task.WhenAll(_OutBlocks.Select(b => b.Value.Completion));

    /// <inheritdoc />
    public IDisposable LinkTo(ITargetBlock<T> target, DataflowLinkOptions linkOptions)
    {
        CreateOutBlock(target);
        return _OutBlocks[target].LinkTo(target, linkOptions);
    }

    /// <inheritdoc />
    public T ConsumeMessage(DataflowMessageHeader messageHeader, ITargetBlock<T> target, out bool messageConsumed)
    {
        return ((ISourceBlock<T>)_OutBlocks[target]).ConsumeMessage(messageHeader, target, out messageConsumed);
    }

    /// <inheritdoc />
    public bool ReserveMessage(DataflowMessageHeader messageHeader, ITargetBlock<T> target)
    {
        return ((ISourceBlock<T>)_OutBlocks[target]).ReserveMessage(messageHeader, target);
    }

    /// <inheritdoc />
    public void ReleaseReservation(DataflowMessageHeader messageHeader, ITargetBlock<T> target)
    {
        ((ISourceBlock<T>)_OutBlocks[target]).ReleaseReservation(messageHeader, target);
    }
}

2 个答案:

答案 0 :(得分:4)

<强> TL / DR
您的实施使用Post内的ActionBlock方法,如果目标拒绝该消息,仍然会丢失数据,切换到SendAsync,并且,您可能不会需要实现所有这些方法,只需要ITargetBlock<in TInput>接口实现。

我想在回到你的主要问题之前澄清一些事情。我认为您对TPL Dataflow库中的一些选项感到困惑,我想在这里解释一下。您所说的行为The first consumer, which receives the message, deletes it from the queue不是BroadcastBlock,而是与ISourceBlock相关联的多个消费者,例如BufferBlock

var buffer = new BufferBlock<int>();
var consumer1 = new ActionBlock<int>(i => {});
var consumer2 = new ActionBlock<int>(i => { Console.WriteLine(i); });
buffer.LinkTo(consumer1);
buffer.LinkTo(consumer2);
// this one will go only for one consumer, no console output present
buffer.Post(1);

BroadcastBlock所做的正是你在说什么,请考虑以下代码:

private static void UnboundedCase()
{
    var broadcast = new BroadcastBlock<int>(i => i);
    var fastAction = new ActionBlock<int>(i => Console.WriteLine($"FAST Unbounded Block: {i}"));
    var slowAction = new ActionBlock<int>(i =>
        {
            Thread.Sleep(2000);
            Console.WriteLine($"SLOW Unbounded Block: {i}");
        });
    broadcast.LinkTo(slowAction, new DataflowLinkOptions { PropagateCompletion = true });
    broadcast.LinkTo(fastAction, new DataflowLinkOptions { PropagateCompletion = true });
    for (var i = 0; i < 3; ++i)
    {
        broadcast.SendAsync(i);
    }
    broadcast.Complete();
    slowAction.Completion.Wait();
}

输出

FAST Unbounded Block: 0
FAST Unbounded Block: 1
FAST Unbounded Block: 2
SLOW Unbounded Block: 0
SLOW Unbounded Block: 1
SLOW Unbounded Block: 2

但是,只有输入数据的速度低于处理数据的速度才能做到这一点,因为在其他情况下,由于缓冲区增长,内存将很快结束,正如您在问题中所述。让我们看看如果我们使用ExecutionDataflowBlockOptions来限制慢速块的传入数据缓冲区会发生什么:

private static void BoundedCase()
{
    var broadcast = new BroadcastBlock<int>(i => i);
    var fastAction = new ActionBlock<int>(i => Console.WriteLine($"FAST Bounded Block: {i}"));
    var slowAction = new ActionBlock<int>(i =>
        {
            Thread.Sleep(2000);
            Console.WriteLine($"SLOW Bounded Block: {i}");
        }, new ExecutionDataflowBlockOptions { BoundedCapacity = 2 });
    broadcast.LinkTo(slowAction, new DataflowLinkOptions { PropagateCompletion = true });
    broadcast.LinkTo(fastAction, new DataflowLinkOptions { PropagateCompletion = true });
    for (var i = 0; i < 3; ++i)
    {
        broadcast.SendAsync(i);
    }
    broadcast.Complete();
    slowAction.Completion.Wait();
}

输出

FAST Bounded Block: 0
FAST Bounded Block: 1
FAST Bounded Block: 2
SLOW Bounded Block: 0
SLOW Bounded Block: 1

正如你所看到的,我们的慢速块丢失了最后一条消息,这不是我们想要的。原因是默认情况下BroadcastBlock使用Post方法传递邮件。根据{{​​3}}:

  
      
  • 后      
        
    • 异步发布到目标块的扩展方法。它会立即返回是否可以接受数据,并且它不允许目标稍后使用该消息
    •   
  •   
  • SendAsync      
        
    • 在支持缓冲的同时异步发送到目标块的扩展方法。目标上的Post操作是异步的,但如果目标想要推迟提供的数据,则无法缓冲数据并且必须强制目标被拒绝。 SendAsync允许通过缓冲异步发布数据,这样如果目标推迟,它稍后将能够从用于此异步发布的消息的临时缓冲区中检索推迟的数据
    •   
  •   

所以,这个方法可以帮助我们完成任务,让我们介绍一些包装器official Intro Document,它完全符合我们的要求 - ActionBlock我们真实处理器的数据:

private static void BoundedWrapperInfiniteCase()
{
    var broadcast = new BroadcastBlock<int>(i => i);
    var fastAction = new ActionBlock<int>(i => Console.WriteLine($"FAST Wrapper Block: {i}"));
    var slowAction = new ActionBlock<int>(i =>
    {
        Thread.Sleep(2000);
        Console.WriteLine($"SLOW Wrapper Block: {i}");
    }, new ExecutionDataflowBlockOptions { BoundedCapacity = 2 });
    var fastActionWrapper = new ActionBlock<int>(i => fastAction.SendAsync(i));
    var slowActionWrapper = new ActionBlock<int>(i => slowAction.SendAsync(i));

    broadcast.LinkTo(slowActionWrapper, new DataflowLinkOptions { PropagateCompletion = true });
    broadcast.LinkTo(fastActionWrapper, new DataflowLinkOptions { PropagateCompletion = true });
    for (var i = 0; i < 3; ++i)
    {
        broadcast.SendAsync(i);
    }
    broadcast.Complete();
    slowAction.Completion.Wait();
}

输出

FAST Unbounded Block: 0
FAST Unbounded Block: 1
FAST Unbounded Block: 2
SLOW Unbounded Block: 0
SLOW Unbounded Block: 1
SLOW Unbounded Block: 2

但是这种等待永远不会结束 - 我们的基本包装器不会传播链接块的完成,SendAsync无法链接到任何内容。我们可以尝试等待包装器完成:

private static void BoundedWrapperFiniteCase()
{
    var broadcast = new BroadcastBlock<int>(i => i);
    var fastAction = new ActionBlock<int>(i => Console.WriteLine($"FAST finite Block: {i}"));
    var slowAction = new ActionBlock<int>(i =>
        {
            Thread.Sleep(2000);
            Console.WriteLine($"SLOW finite Block: {i}");
        }, new ExecutionDataflowBlockOptions { BoundedCapacity = 2 });
    var fastActionWrapper = new ActionBlock<int>(i => fastAction.SendAsync(i));
    var slowActionWrapper = new ActionBlock<int>(i => slowAction.SendAsync(i));
    broadcast.LinkTo(slowActionWrapper, new DataflowLinkOptions { PropagateCompletion = true });
    broadcast.LinkTo(fastActionWrapper, new DataflowLinkOptions { PropagateCompletion = true });
    for (var i = 0; i < 3; ++i)
    {
        broadcast.SendAsync(i);
    }
    broadcast.Complete();
    slowActionWrapper.Completion.Wait();
}

输出

FAST finite Block: 0
FAST finite Block: 1
FAST finite Block: 2
SLOW finite Block: 0

这绝对不是我们想要的 - ActionBlock完成了所有工作,并且不会等待最后一条消息的发布。此外,我们甚至看不到第二条消息,因为我们在Sleep方法结束之前退出方法!所以你肯定需要自己的实现。

现在,最后,关于你的代码的一些想法:

  1. 您不需要实现如此大量的方法 - 您的包装器将用作ActionBlock,因此只实现该接口。
  2. 您的实施使用ITargetBlock<in TInput>内的Post方法,正如我们所看到的那样,如果消费者方面出现问题,可能会导致数据丢失。请改用ActionBlock方法。
  3. 在上次更改之后,您应该测量数据流的性能 - 如果您有许多异步等待数据传递,您可能会看到性能和/或内存问题。这应该通过SendAsync中讨论的一些高级设置来修复。
  4. linked documentation任务的实现实际上颠倒了数据流的顺序 - 您正在等待目标完成,我认为这不是一个好习惯 - 您可能应该为数据流创建一个结束块(这可能是偶数Completion阻止,只是同步丢弃传入的消息),并等待它完成。

答案 1 :(得分:0)

我只想在BoundedWrapperInfiniteCase中添加到VMAtm's excellent answer中,您可以手动传播完成。在对broadcast.SendAsync()的调用之前添加以下行,然后等待两个动作完成以使动作包装器完成内部动作:

slowActionWrapper.Completion.ContinueWith(t =>
    {
        if (t.IsFaulted) ((IDataflowBlock)slowAction).Fault(t.Exception);
        else slowAction.Complete();
    });
fastActionWrapper.Completion.ContinueWith(t =>
    {
        if (t.IsFaulted) ((IDataflowBlock)fastAction).Fault(t.Exception);
        else fastAction.Complete();
    });

例如

var broadcast = new BroadcastBlock<int>(i => i);
var fastAction = new ActionBlock<int>(i => Console.WriteLine($"FAST Wrapper Block: {i}"));
var slowAction = new ActionBlock<int>(i =>
    {
        Thread.Sleep(2000);
        Console.WriteLine($"SLOW Wrapper Block: {i}");
    }, new ExecutionDataflowBlockOptions { BoundedCapacity = 2 });
var fastActionWrapper = new ActionBlock<int>(i => fastAction.SendAsync(i));
var slowActionWrapper = new ActionBlock<int>(i => slowAction.SendAsync(i));

broadcast.LinkTo(slowActionWrapper, new DataflowLinkOptions { PropagateCompletion = true });
broadcast.LinkTo(fastActionWrapper, new DataflowLinkOptions { PropagateCompletion = true });

// Manually propagate completion to the inner actions
slowActionWrapper.Completion.ContinueWith(t =>
    {
        if (t.IsFaulted) ((IDataflowBlock)slowAction).Fault(t.Exception);
        else slowAction.Complete();
    });
fastActionWrapper.Completion.ContinueWith(t =>
    {
        if (t.IsFaulted) ((IDataflowBlock)fastAction).Fault(t.Exception);
        else fastAction.Complete();
    });

for (var i = 0; i < 3; ++i)
    broadcast.SendAsync(i);
broadcast.Complete();

// Wait for both inner actions to complete
Task.WaitAll(slowAction.Completion, fastAction.Completion);

输出将与VMAtm的答案相同,但所有任务将正确完成。