BatchBlock生成批处理,其中包含在TriggerBatch()之后发送的元素

时间:2016-03-09 14:50:53

标签: task-parallel-library tpl-dataflow

我有一个由几个块组成的Dataflow管道。 当元素流经我的处理管道时,我想按字段A对它们进行分组。要做到这一点,我的BatchBlockBoundedCapacity。在其中我存储我的元素,直到我决定它们应该被释放。所以我调用TriggerBatch()方法。

private void Forward(TStronglyTyped data)
{
    if (ShouldCreateNewGroup(data))
    {
        GroupingBlock.TriggerBatch();
    }

 GroupingBlock.SendAsync(data).Wait(SendTimeout);
}

这就是它的样子。 问题是,生成的批次有时包含下一个已发布的元素,该元素不应存在。

举例说明:

BatchBlock.InputQueue = {A,A,A}
NextElement = B //we should trigger a Batch!
BatchBlock.TriggerBatch()
BatchBlock.SendAsync(B);

在这一点上,我希望我的批次为{A,A,A},但它是{A,A,A,B}

类似TriggerBatch()是异步的,SendAsync实际上是在实际批处理之前执行的

我该如何解决这个问题? 我显然不想把Task.Wait(x)放在那里(我试过,但是它有效,但当然性能很差)。

2 个答案:

答案 0 :(得分:3)

我也是通过尝试在错误的位置调用TriggerBatch来遇到此问题的。如上所述,使用DataflowBlock.Encapsulate的SlidingWindow示例就是答案,但需要一些时间才能适应,所以我想我会分享已完成的块。

我的ConditionalBatchBlock创建最大尺寸的批次,如果满足某个条件,可能会更快。在我的特定场景中,我需要创建100个批次,但在检测到数据中的某些更改时始终创建新批次。

public static IPropagatorBlock<T, T[]> CreateConditionalBatchBlock<T>(int batchSize, Func<Queue<T>, T, bool> condition)
{
    var queue = new Queue<T>();

    var source = new BufferBlock<T[]>();

    var target = new ActionBlock<T>(async item =>
    {
        // start a new batch if required by the condition
        if (condition(queue, item))
        {
            await source.SendAsync(queue.ToArray());
            queue.Clear();
        }

        queue.Enqueue(item);

        // always send a batch when the max size has been reached
        if (queue.Count == batchSize)
        {
            await source.SendAsync(queue.ToArray());
            queue.Clear();
        }
    });

    // send any remaining items
    target.Completion.ContinueWith(async t =>
    {
        if (queue.Any())
            await source.SendAsync(queue.ToArray());

        source.Complete();
    });

    return DataflowBlock.Encapsulate(target, source);
}

condition参数在您的情况下可能更简单。我需要查看队列以及当前项目以确定是否创建新批次。

我这样用过:

public async Task RunExampleAsync<T>()
{
    var conditionalBatchBlock = CreateConditionalBatchBlock<T>(100, (queue, currentItem) => ShouldCreateNewBatch(queue, currentItem));

    var actionBlock = new ActionBlock<T[]>(async x => await PerformActionAsync(x));

    conditionalBatchBlock.LinkTo(actionBlock, new DataflowLinkOptions { PropagateCompletion = true });

    await ReadDataAsync<T>(conditionalBatchBlock);

    await actionBlock.Completion;
}

答案 1 :(得分:0)

这是Loren Paulsen的CreateConditionalBatchBlock方法的专门版本。该参数接受Func<TItem, TKey> keySelector参数,并且每次收到具有不同密钥的项目时都会发出新的批处理。

public static IPropagatorBlock<TItem, TItem[]> CreateConditionalBatchBlock<TItem, TKey>(
    Func<TItem, TKey> keySelector,
    DataflowBlockOptions dataflowBlockOptions = null,
    int maxBatchSize = DataflowBlockOptions.Unbounded,
    IEqualityComparer<TKey> keyComparer = null)
{
    if (keySelector == null) throw new ArgumentNullException(nameof(keySelector));
    if (maxBatchSize < 1 && maxBatchSize != DataflowBlockOptions.Unbounded)
        throw new ArgumentOutOfRangeException(nameof(maxBatchSize));

    keyComparer = keyComparer ?? EqualityComparer<TKey>.Default;
    var options = new ExecutionDataflowBlockOptions();
    if (dataflowBlockOptions != null)
    {
        options.BoundedCapacity = dataflowBlockOptions.BoundedCapacity;
        options.CancellationToken = dataflowBlockOptions.CancellationToken;
        options.MaxMessagesPerTask = dataflowBlockOptions.MaxMessagesPerTask;
        options.TaskScheduler = dataflowBlockOptions.TaskScheduler;
    }

    var output = new BufferBlock<TItem[]>(options);

    var queue = new Queue<TItem>(); // Synchronization is not needed
    TKey previousKey = default;

    var input = new ActionBlock<TItem>(async item =>
    {
        var key = keySelector(item);
        if (queue.Count > 0 && !keyComparer.Equals(key, previousKey))
        {
            await output.SendAsync(queue.ToArray()).ConfigureAwait(false);
            queue.Clear();
        }
        queue.Enqueue(item);
        previousKey = key;

        if (queue.Count == maxBatchSize)
        {
            await output.SendAsync(queue.ToArray()).ConfigureAwait(false);
            queue.Clear();
        }
    }, options);

    _ = input.Completion.ContinueWith(async t =>
    {
        if (queue.Count > 0)
        {
            await output.SendAsync(queue.ToArray()).ConfigureAwait(false);
            queue.Clear();
        }
        if (t.IsFaulted)
        {
            ((IDataflowBlock)output).Fault(t.Exception.InnerException);
        }
        else
        {
            output.Complete();
        }
    }, TaskScheduler.Default);

    return DataflowBlock.Encapsulate(input, output);
}