我创建了类似于Web爬网程序的内容,以创建我需要管理的1000多种Web服务的报告。因此,我创建了一个TPL数据流管道来管理数据的获取和处理。 我想象的管道看起来有点像这样(对我的绘画技巧很抱歉:D):
我已经创建了一个实现,并且一切正常,直到我整体上启动了Pipeline。我向管道中输入了500个对象作为管道的输入,并希望程序运行一会儿,但是在移至执行块后,该程序停止了执行。 在检查了程序的流程之后,对我来说,完成似乎迅速传播到了Dispose Block。 我使用相同的管道创建了一个小示例项目,以检查它是我的Input类实现还是管道本身。示例代码是这样的:
public class Job
{
public int Ticker { get; set; }
public Type Type { get; }
public Job(Type type)
{
Type = type;
}
public Task Prepare()
{
Console.WriteLine("Preparing");
Ticker = 0;
return Task.CompletedTask;
}
public Task Tick()
{
Console.WriteLine("Ticking");
Ticker++;
return Task.CompletedTask;
}
public bool IsCommitable()
{
Console.WriteLine("Trying to commit");
return IsFinished() || ( Ticker != 0 && Ticker % 100000 == 0);
}
public bool IsFinished()
{
Console.WriteLine("Trying to finish");
return Ticker == 1000000;
}
public void IntermediateCleanUp()
{
Console.WriteLine("intermediate Cleanup");
Ticker = Ticker - 120;
}
public void finalCleanUp()
{
Console.WriteLine("Final Cleanup");
Ticker = -1;
}
}
这是我的Input类,已输入到Preparation Block中。
public class Dataflow
{
private TransformBlock<Job, Job> _preparationsBlock;
private BufferBlock<Job> _balancerBlock;
private readonly ExecutionDataflowBlockOptions _options = new ExecutionDataflowBlockOptions
{
BoundedCapacity = 4
};
private readonly DataflowLinkOptions _linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
private TransformBlock<Job, Job> _typeATickBlock;
private TransformBlock<Job, Job> _typeBTickBlock;
private TransformBlock<Job, Job> _writeBlock;
private TransformBlock<Job, Job> _intermediateCleanupBlock;
private ActionBlock<Job> _finalCleanupBlock;
public async Task Process()
{
CreateBlocks();
ConfigureBlocks();
for (int i = 0; i < 500; i++)
{
await _preparationsBlock.SendAsync(new Job(i % 2 == 0 ? Type.A : Type.B));
}
_preparationsBlock.Complete();
await Task.WhenAll(_preparationsBlock.Completion, _finalCleanupBlock.Completion);
}
private void CreateBlocks()
{
_preparationsBlock = new TransformBlock<Job, Job>(async job =>
{
await job.Prepare();
return job;
}, _options);
_balancerBlock = new BufferBlock<Job>(_options);
_typeATickBlock = new TransformBlock<Job, Job>(async job =>
{
await job.Tick();
return job;
}, _options);
_typeBTickBlock = new TransformBlock<Job, Job>(async job =>
{
await job.Tick();
await job.Tick();
return job;
}, _options);
_writeBlock = new TransformBlock<Job, Job>(job =>
{
Console.WriteLine(job.Ticker);
return job;
}, _options);
_finalCleanupBlock = new ActionBlock<Job>(job => job.finalCleanUp(), _options);
_intermediateCleanupBlock = new TransformBlock<Job, Job>(job =>
{
job.IntermediateCleanUp();
return job;
}, _options);
}
private void ConfigureBlocks()
{
_preparationsBlock.LinkTo(_balancerBlock, _linkOptions);
_balancerBlock.LinkTo(_typeATickBlock, _linkOptions, job => job.Type == Type.A);
_balancerBlock.LinkTo(_typeBTickBlock, _linkOptions, job => job.Type == Type.B);
_typeATickBlock.LinkTo(_typeATickBlock, _linkOptions, job => !job.IsCommitable());
_typeATickBlock.LinkTo(_writeBlock, _linkOptions, job => job.IsCommitable());
_typeBTickBlock.LinkTo(_typeBTickBlock, _linkOptions, job => !job.IsCommitable());
_writeBlock.LinkTo(_intermediateCleanupBlock, _linkOptions, job => !job.IsFinished());
_writeBlock.LinkTo(_finalCleanupBlock, _linkOptions, job => job.IsFinished());
_intermediateCleanupBlock.LinkTo(_typeATickBlock, _linkOptions, job => job.Type == Type.A);
}
}
这是我的数据流水线,代表我上面的“艺术品”:D。 所有这些都在Programm.cs中启动的我的Scheduler中执行:
public class Scheduler
{
private readonly Timer _timer;
private readonly Dataflow _flow;
public Scheduler(int intervall)
{
_timer = new Timer(intervall);
_flow = new Dataflow();
}
public void Start()
{
_timer.AutoReset = false;
_timer.Elapsed += _timer_Elapsed;
_timer.Start();
}
private async void _timer_Elapsed(object sender, ElapsedEventArgs e)
{
try
{
_timer.Stop();
Console.WriteLine("Timer stopped");
await _flow.Process().ConfigureAwait(false);
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
}
finally
{
Console.WriteLine("Timer started again.");
_timer.Start();
}
}
}
class Program
{
static void Main(string[] args)
{
var scheduler = new Scheduler(1000);
scheduler.Start();
Console.ReadKey();
}
}
我得到的控制台输出是: 计时器已停止 准备中 滴答 尝试提交 试图完成 滴答 尝试提交 试图完成 滴答 尝试提交 试图完成 滴答 尝试提交 试图完成 滴答 尝试提交 试图完成 滴答 尝试提交 试图完成 滴答 尝试提交 试图完成 滴答 尝试提交 试图完成 滴答 尝试提交 试图完成 滴答 尝试提交 试图完成 尝试提交 试图完成
该程序似乎已停止工作,因为我没有遇到任何断点或任何进一步的消息。我认为我所有的积木都已收到完成信号,因此停止服用任何新物品。因此,我的问题是:如何管理“完成”信号,以使管道仅在没有更多工作要做时才结束?
答案 0 :(得分:2)
流程的主要问题是对报价栏的反馈回路。这导致两个问题。
首先:反压力
_typeATickBlock
自身重新链接后,一旦其容量达到最大值,它将停止接受所有消息。在您的情况4中,这意味着一旦它在输出缓冲区中有3条消息并且正在处理一条消息,它将停止接受和传递消息。您可以通过在代码块中添加以下行来看到此内容:
Console.WriteLine($"Tick Block {_typeATickBlock.InputCount}/{_typeATickBlock.OutputCount}");
将输出:
Tick Block 0/3
要解决此问题,您可以添加任何缓冲块,Buffer或Transform。关键将是缓冲区的有限容量。在您的情况下,每条消息都需要重新路由回刻度线。这样,您就知道您的容量需要在任何给定时间匹配消息量。在这种情况下为500。
_printingBuffer = new TransformBlock<Job, Job>(job =>
{
Console.WriteLine($"{_printingBuffer.InputCount}/{_printingBuffer.OutputCount}");
return job;
}, new ExecutionDataflowBlockOptions() { BoundedCapacity = 500 });
在您的真实代码中,您可能不知道该值,Unbounded
可能是避免锁定管道的最佳选择,但是您可以根据传入的音量来调整此值。
第二步:完成流程
在管道中使用反馈循环时,完成传播比简单地设置链接选项更加困难。一旦完成点击了滴答框,它就会停止接受所有消息,甚至仍然需要处理的消息。为了避免这种情况,您需要保持传播,直到所有消息都通过循环。首先,您在刻度块之前停止传播,然后检查参与循环的每个块上的缓冲区。然后,一旦所有缓冲区都为空,则将完成和故障传播到块。
_balancerBlock.Completion.ContinueWith(tsk =>
{
while (!_typeATickBlock.Completion.IsCompleted)
{
if (_printingBuffer.InputCount == 0 && _printingBuffer.OutputCount == 0
&& _typeATickBlock.InputCount == 0 && _typeATickBlock.OutputCount == 0)
{
_typeATickBlock.Complete();
}
}
});
最后
具有完成设置的完整ConfigureBlocks
和插入的缓冲区应如下所示。注意,我只传递了完整的消息,没有错误,因此删除了B型分支。
private void ConfigureBlocks()
{
_preparationsBlock.LinkTo(_balancerBlock, _linkOptions);
_balancerBlock.LinkTo(_typeATickBlock, job => job.Type == Type.A);
_balancerBlock.Completion.ContinueWith(tsk =>
{
while (!_typeATickBlock.Completion.IsCompleted)
{
if (_printingBuffer.InputCount == 0 && _printingBuffer.OutputCount == 0
&& _typeATickBlock.InputCount == 0 && _typeATickBlock.OutputCount == 0)
{
_typeATickBlock.Complete();
}
}
});
_typeATickBlock.LinkTo(_printingBuffer, job => !job.IsCommitable());
_printingBuffer.LinkTo(_typeATickBlock);
_typeATickBlock.LinkTo(_writeBlock, _linkOptions, job => job.IsCommitable());
_writeBlock.LinkTo(_intermediateCleanupBlock, _linkOptions, job => !job.IsFinished());
_writeBlock.LinkTo(_finalCleanupBlock, _linkOptions, job => job.IsFinished());
_intermediateCleanupBlock.LinkTo(_typeATickBlock, _linkOptions, job => job.Type == Type.A);
}
不久前,我写了一篇博客文章,博客不再活跃,它涉及通过反馈循环来处理完成情况。它可能会提供更多帮助。从WayBackMachine检索。