我有一个由几个块组成的Dataflow管道。
当元素流经我的处理管道时,我想按字段A
对它们进行分组。要做到这一点,我的BatchBlock
高BoundedCapacity
。在其中我存储我的元素,直到我决定它们应该被释放。所以我调用TriggerBatch()
方法。
private void Forward(TStronglyTyped data)
{
if (ShouldCreateNewGroup(data))
{
GroupingBlock.TriggerBatch();
}
GroupingBlock.SendAsync(data).Wait(SendTimeout);
}
这就是它的样子。 问题是,生成的批次有时包含下一个已发布的元素,该元素不应存在。
举例说明:
BatchBlock.InputQueue = {A,A,A}
NextElement = B //we should trigger a Batch!
BatchBlock.TriggerBatch()
BatchBlock.SendAsync(B);
在这一点上,我希望我的批次为{A,A,A}
,但它是{A,A,A,B}
类似TriggerBatch()
是异步的,SendAsync
实际上是在实际批处理之前执行的。
我该如何解决这个问题?
我显然不想把Task.Wait(x)
放在那里(我试过,但是它有效,但当然性能很差)。
答案 0 :(得分:3)
我也是通过尝试在错误的位置调用TriggerBatch
来遇到此问题的。如上所述,使用DataflowBlock.Encapsulate
的SlidingWindow示例就是答案,但需要一些时间才能适应,所以我想我会分享已完成的块。
我的ConditionalBatchBlock
创建最大尺寸的批次,如果满足某个条件,可能会更快。在我的特定场景中,我需要创建100个批次,但在检测到数据中的某些更改时始终创建新批次。
public static IPropagatorBlock<T, T[]> CreateConditionalBatchBlock<T>(int batchSize, Func<Queue<T>, T, bool> condition)
{
var queue = new Queue<T>();
var source = new BufferBlock<T[]>();
var target = new ActionBlock<T>(async item =>
{
// start a new batch if required by the condition
if (condition(queue, item))
{
await source.SendAsync(queue.ToArray());
queue.Clear();
}
queue.Enqueue(item);
// always send a batch when the max size has been reached
if (queue.Count == batchSize)
{
await source.SendAsync(queue.ToArray());
queue.Clear();
}
});
// send any remaining items
target.Completion.ContinueWith(async t =>
{
if (queue.Any())
await source.SendAsync(queue.ToArray());
source.Complete();
});
return DataflowBlock.Encapsulate(target, source);
}
condition
参数在您的情况下可能更简单。我需要查看队列以及当前项目以确定是否创建新批次。
我这样用过:
public async Task RunExampleAsync<T>()
{
var conditionalBatchBlock = CreateConditionalBatchBlock<T>(100, (queue, currentItem) => ShouldCreateNewBatch(queue, currentItem));
var actionBlock = new ActionBlock<T[]>(async x => await PerformActionAsync(x));
conditionalBatchBlock.LinkTo(actionBlock, new DataflowLinkOptions { PropagateCompletion = true });
await ReadDataAsync<T>(conditionalBatchBlock);
await actionBlock.Completion;
}
答案 1 :(得分:0)
这是Loren Paulsen的CreateConditionalBatchBlock
方法的专门版本。该参数接受Func<TItem, TKey> keySelector
参数,并且每次收到具有不同密钥的项目时都会发出新的批处理。
public static IPropagatorBlock<TItem, TItem[]> CreateConditionalBatchBlock<TItem, TKey>(
Func<TItem, TKey> keySelector,
DataflowBlockOptions dataflowBlockOptions = null,
int maxBatchSize = DataflowBlockOptions.Unbounded,
IEqualityComparer<TKey> keyComparer = null)
{
if (keySelector == null) throw new ArgumentNullException(nameof(keySelector));
if (maxBatchSize < 1 && maxBatchSize != DataflowBlockOptions.Unbounded)
throw new ArgumentOutOfRangeException(nameof(maxBatchSize));
keyComparer = keyComparer ?? EqualityComparer<TKey>.Default;
var options = new ExecutionDataflowBlockOptions();
if (dataflowBlockOptions != null)
{
options.BoundedCapacity = dataflowBlockOptions.BoundedCapacity;
options.CancellationToken = dataflowBlockOptions.CancellationToken;
options.MaxMessagesPerTask = dataflowBlockOptions.MaxMessagesPerTask;
options.TaskScheduler = dataflowBlockOptions.TaskScheduler;
}
var output = new BufferBlock<TItem[]>(options);
var queue = new Queue<TItem>(); // Synchronization is not needed
TKey previousKey = default;
var input = new ActionBlock<TItem>(async item =>
{
var key = keySelector(item);
if (queue.Count > 0 && !keyComparer.Equals(key, previousKey))
{
await output.SendAsync(queue.ToArray()).ConfigureAwait(false);
queue.Clear();
}
queue.Enqueue(item);
previousKey = key;
if (queue.Count == maxBatchSize)
{
await output.SendAsync(queue.ToArray()).ConfigureAwait(false);
queue.Clear();
}
}, options);
_ = input.Completion.ContinueWith(async t =>
{
if (queue.Count > 0)
{
await output.SendAsync(queue.ToArray()).ConfigureAwait(false);
queue.Clear();
}
if (t.IsFaulted)
{
((IDataflowBlock)output).Fault(t.Exception.InnerException);
}
else
{
output.Complete();
}
}, TaskScheduler.Default);
return DataflowBlock.Encapsulate(input, output);
}