我正在开发一个Dataflow管道,它读取文件集合,并为每个文件中的每一行执行一系列Dataflow块。
对文件中的每一行所有步骤完成后,我想在文件本身上执行更多的块,但我不知道这是怎么回事。
通过TransformManyBlock
分割处理是直截了当的,但是如何合并呢?
我已经习惯了Apache Camel的Splitter和Aggregator功能 - 或者Dataflow的意图与我想要的用法之间存在根本脱节?
答案 0 :(得分:1)
您可能应该调查@Input
和JoinBlock
。它们都可以连接两个或三个源,你可以为它们设置一个过滤器来专门收集一些项目。
一些有用的链接:
答案 1 :(得分:1)
正确实现Splitter和Aggregator块将变得太复杂而难以使用。因此,我想出了一个更简单的API,该API封装了两个块,一个主块和一个细节块。每个块的处理选项不同。主块执行拆分和汇总操作,而细节块执行每个细节的转换。关于两组单独的选项的唯一要求是,CancellationToken
两者必须相同。可以为每个块分别设置所有其他选项(MaxDegreeOfParallelism
,BoundedCapacity
,EnsureOrdered
,TaskScheduler
等)。
public static TransformBlock<TInput, TOutput>
CreateSplitterAggregatorBlock<TInput, TDetail, TDetailResult, TOutput>(
Func<TInput, Task<IEnumerable<TDetail>>> split,
Func<TDetail, Task<TDetailResult>> transformDetail,
Func<TInput, TDetailResult[], TOutput> aggregate,
ExecutionDataflowBlockOptions splitAggregateOptions = null,
ExecutionDataflowBlockOptions transformDetailOptions = null)
{
if (split == null) throw new ArgumentNullException(nameof(split));
if (aggregate == null) throw new ArgumentNullException(nameof(aggregate));
if (transformDetail == null)
throw new ArgumentNullException(nameof(transformDetail));
splitAggregateOptions = splitAggregateOptions ??
new ExecutionDataflowBlockOptions();
var cancellationToken = splitAggregateOptions.CancellationToken;
transformDetailOptions = transformDetailOptions ??
new ExecutionDataflowBlockOptions() { CancellationToken = cancellationToken };
if (transformDetailOptions.CancellationToken != cancellationToken)
throw new ArgumentException("Incompatible options", "CancellationToken");
var detailTransformer = new ActionBlock<Task<Task<TDetailResult>>>(async task =>
{
try
{
task.RunSynchronously();
await task.Unwrap().ConfigureAwait(false);
}
catch { } // Suppress exceptions (errors are propagated through the task)
}, transformDetailOptions);
return new TransformBlock<TInput, TOutput>(async item =>
{
IEnumerable<TDetail> details = await split(item); //continue on captured context
TDetailResult[] detailResults = await Task.Run(async () =>
{
var tasks = new List<Task<TDetailResult>>();
foreach (var detail in details)
{
var taskFactory = new Task<Task<TDetailResult>>(
() => transformDetail(detail), cancellationToken);
var accepted = await detailTransformer.SendAsync(taskFactory,
cancellationToken).ConfigureAwait(false);
if (!accepted)
{
cancellationToken.ThrowIfCancellationRequested();
throw new InvalidOperationException("Unexpected detail rejection.");
}
var task = taskFactory.Unwrap();
// Assume that the detailTransformer will never fail, and so the task
// will eventually complete. Guarding against this unlikely scenario
// with Task.WhenAny(task, detailTransformer.Completion) seems overkill.
tasks.Add(task);
}
return await Task.WhenAll(tasks).ConfigureAwait(false);
}); // continue on captured context
return aggregate(item, detailResults);
}, splitAggregateOptions);
}
// Overload with synchronous lambdas
public static TransformBlock<TInput, TOutput>
CreateSplitterAggregatorBlock<TInput, TDetail, TDetailResult, TOutput>(
Func<TInput, IEnumerable<TDetail>> split,
Func<TDetail, TDetailResult> transformDetail,
Func<TInput, TDetailResult[], TOutput> aggregate,
ExecutionDataflowBlockOptions splitAggregateOptions = null,
ExecutionDataflowBlockOptions transformDetailOptions = null)
{
return CreateSplitterAggregatorBlock(
item => Task.FromResult(split(item)),
detail => Task.FromResult(transformDetail(detail)),
aggregate, splitAggregateOptions, transformDetailOptions);
}
以下是此块的用法示例。输入是包含逗号分隔数字的字符串。将每个字符串分割,然后将每个数字加倍,最后将每个输入字符串的加倍数字相加。
var processor = CreateSplitterAggregatorBlock<string, int, int, int>(split: str =>
{
var parts = str.Split(',');
return parts.Select(part => Int32.Parse(part));
}, transformDetail: number =>
{
return number * 2;
}, aggregate: (str, numbersArray) =>
{
var sum = numbersArray.Sum();
Console.WriteLine($"[{str}] => {sum}");
return sum;
});
processor.Post("1, 2, 3");
processor.Post("4, 5");
processor.Post("6, 7, 8, 9");
processor.Complete();
processor.LinkTo(DataflowBlock.NullTarget<int>());
processor.Completion.Wait();
输出:
[1, 2, 3] => 12
[4, 5] => 18
[6, 7, 8, 9] => 60