TPL数据流 - 并行和异步处理,同时保持订单

时间:2017-01-13 11:35:01

标签: c# asynchronous task-parallel-library tpl-dataflow

我创建了一个TPL数据流管道,它包含3个TransformBlock和最后一个ActionBlock。

var loadXml = new TransformBlock<Job, Job>(job => { ... }); // I/O
var validateData = new TransformBlock<Job, Job>(job => { ... }); // Parsing&Validating&Calculations
var importJob = new TransformBlock<Job, Job>(job => { ... }); // Saving to database

var loadingFailed = new ActionBlock<Job>(job => CreateResponse(job));
var validationFailed = new ActionBlock<Job>(job => CreateResponse(job));
var reportImport = new ActionBlock<Job>(job => CreateResponse(job));

loadXml.LinkTo(validateData, job => job.ReturnCode == 100);
loadXml.LinkTo(loadingFailed);

validateData.LinkTo(importJob, Job => Job.ReturnCode == 100);
validateData.LinkTo(validationFailed);

importJob.LinkTo(reportImport);

每个块都会用处理后的数据填充Job-object,因为我不仅需要数据本身,还需要响应消息所需的一般信息。我几乎添加了一个XML路径,如果一切正常,我会获得一个包含信息的Response-object。

如何实现这一目标,如果两个或多个文件进入需要一些时间从HDD读取,它会同时读取并行和异步文件,同时保持它们的顺序?如果file1花费更多时间,文件2需要等待file1完成才能将数据传递到下一个Block,然后它也将开始验证数据并行和异步,但这里也保持下一个块的顺序?

现在,即使我将SendAsync调用到headblock,它也会依次处理所有文件。

编辑:所以我为了管道的目的写了一个小测试类。它有3个阶段。我想要实现的是第一个在文件进入时继续读取文件的TransformBlock(来自FileSystemWatcher的SendAsync)并在完成后按顺序输出它。意味着如果File1是一个大文件并且File2 + 3进来,两者都是将被读入,而File1仍在处理中,但File2 + 3必须等到它可以发送到第二个TransformBlock,因为File1仍然被读入.Stage2应该工作相同。另一方面,Stage3需要将从File1生成的对象保存到数据库中,这可以并行和异步完成。但是,需要在file2和file3之前处理来自file1的对象。所以文件内容作为一个整体需要按顺序处理,以便它们进入。我尝试通过将MaxDegreeOfParallelismBoundedCapacity设置为1来限制第3个TransformBlock,但这似乎失败了并没有真正保持在Console.WriteLine的订单

using System;
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
using System.Xml;
using System.Linq;

namespace OrderProcessing
{
    public class Job
    {
        public string Path { get; set; }

        public XmlDocument Document { get; set; }

        public List<Object> BusinessObjects { get; set; }

        public int ReturnCode { get; set; }

        public int ID { get; set; }
    }

    public class Test
    {
        ITargetBlock<Job> pathBlock = null;

        CancellationTokenSource cancellationTokenSource;

        Random rnd = new Random();

        private bool ReadDocument(Job job)
        {
            Console.WriteLine($"ReadDocument {DateTime.Now.TimeOfDay} | Thread {Thread.CurrentThread.ManagedThreadId} is processing Job Id: {job.ID}");
            Task.Delay(rnd.Next(1000, 3000)).Wait();

            // Throw OperationCanceledException if cancellation is requested.
            cancellationTokenSource.Token.ThrowIfCancellationRequested();

            // Read the document
            job.Document = new XmlDocument();

            // Some checking
            return true;
        }

        private bool ValidateXml(Job job)
        {
            Console.WriteLine($"ValidateXml {DateTime.Now.TimeOfDay} | Thread {Thread.CurrentThread.ManagedThreadId} is processing Job Id: {job.ID}");
            Task.Delay(rnd.Next(1000, 3000)).Wait();

            // Throw OperationCanceledException if cancellation is requested.
            cancellationTokenSource.Token.ThrowIfCancellationRequested();

            // Check XML against XSD and perform remaining checks
            job.BusinessObjects = new List<object>();

            // Just for tests
            job.BusinessObjects.Add(new object());
            job.BusinessObjects.Add(new object());

            // Parse Xml and create business objects
            return true;
        }

        private bool ProcessJob(Job job)
        {
            Console.WriteLine($"ProcessJob {DateTime.Now.TimeOfDay} | Thread {Thread.CurrentThread.ManagedThreadId} is processing Job Id: {job.ID}");

            // Throw OperationCanceledException if cancellation is requested.
            cancellationTokenSource.Token.ThrowIfCancellationRequested();

            Parallel.ForEach(job.BusinessObjects, bO =>
            {
                ImportObject(bO);
            });


            // Import the job
            return true;
        }

        private object ImportObject(object o)
        {
            Task.Delay(rnd.Next(1000, 3000)).Wait();

            return new object();
        }

        private void CreateResponse(Job job)
        {
            if(job.ReturnCode == 100)
            {
                Console.WriteLine("ID {0} was successfully imported.", job.ID);

            }
            else
            {
                Console.WriteLine("ID {0} failed to import.", job.ID);
            }

            // Create response XML with returncodes
        }

        ITargetBlock<Job> CreateJobProcessingPipeline()
        {
            var loadXml = new TransformBlock<Job, Job>(job =>
            {
                try
                {
                    if(ReadDocument(job))
                    {
                        // For later error handling
                        job.ReturnCode = 100; // success
                    }
                    else
                    {
                        job.ReturnCode = 200;
                    }

                    return job;
                }
                catch(OperationCanceledException)
                {
                    job.ReturnCode = 300;
                    return job;
                }
            }, TransformBlockOptions());

            var validateXml = new TransformBlock<Job, Job>(job =>
            {
                try
                {
                    if(ValidateXml(job))
                    {
                        // For later error handling
                        job.ReturnCode = 100;
                    }
                    else
                    {
                        job.ReturnCode = 200;
                    }

                    return job;
                }
                catch(OperationCanceledException)
                {
                    job.ReturnCode = 300;
                    return job;
                }
            }, TransformBlockOptions());


            var importJob = new TransformBlock<Job, Job>(job =>
            {
                try
                {
                    if(ProcessJob(job))
                    {
                        // For later error handling
                        job.ReturnCode = 100; // success
                    }
                    else
                    {
                        job.ReturnCode = 200;
                    }

                    return job;
                }
                catch(OperationCanceledException)
                {
                    job.ReturnCode = 300;
                    return job;
                }
            }, ActionBlockOptions());

            var loadingFailed = new ActionBlock<Job>(job => CreateResponse(job), ActionBlockOptions());
            var validationFailed = new ActionBlock<Job>(job => CreateResponse(job), ActionBlockOptions());
            var reportImport = new ActionBlock<Job>(job => CreateResponse(job), ActionBlockOptions());

            //
            // Connect the pipeline
            //
            loadXml.LinkTo(validateXml, job => job.ReturnCode == 100);
            loadXml.LinkTo(loadingFailed);

            validateXml.LinkTo(importJob, Job => Job.ReturnCode == 100);
            validateXml.LinkTo(validationFailed);

            importJob.LinkTo(reportImport);

            // Return the head of the network.
            return loadXml;
        }

        public void Start()
        {
            cancellationTokenSource = new CancellationTokenSource();

            pathBlock = CreateJobProcessingPipeline();
        }

        public async void AddJob(string path, int id)
        {
            Job j = new Job();
            j.Path = path;
            j.ID = id;

            await pathBlock.SendAsync(j);
        }

        static ExecutionDataflowBlockOptions TransformBlockOptions()
        {
            return new ExecutionDataflowBlockOptions
            {
                MaxDegreeOfParallelism = 8,
                BoundedCapacity = 32
            };
        }

        private static ExecutionDataflowBlockOptions ActionBlockOptions()
        {
            return new ExecutionDataflowBlockOptions
            {
                MaxDegreeOfParallelism = 1,
                BoundedCapacity = 1
            };
        }

        public void Cancel()
        {
            if(cancellationTokenSource != null)
                cancellationTokenSource.Cancel();
        }
    }

    class Program
    {
        private static String InputXml = @"C:\XML\Part.xml";
        private static Test _Pipeline;

        static void Main(string[] args)
        {
            _Pipeline = new Test();
            _Pipeline.Start();


            var data = Enumerable.Range(1, 100);

            foreach(var d in data)
                _Pipeline.AddJob(InputXml, d);

            //Wait before closing the application so we can see the results.
            Console.ReadLine();
        }
    }
}

EDIT2:在我通过将BoundedCapacity设置为Unbounded进行一次更改后,我按照发送到管道的顺序获得了所有内容。所以它之前并没有真正失灵,但是我猜的消息丢弃了吗?

如果我确保EnsureOrdered为真,并且在上一个MaxDegreeOfParallelism中使用TransformBlock为8,则如果您检查下面的输出,则项目不再有序。但这是它需要按顺序排列的地方,因为我将数据保存到数据库中,数据库需要按照它的顺序排列。如果它离开最后一个{{{}它不是很有用,它并不重要1}},所以我想我不能在这里保持并行性?

TransformBlock

EDIT3: 使用@JSteward最新代码后的输出。

ValidateXml 08:27:24.2855461 | Thread 21 is processing Job Id: 36
ValidateXml 08:27:24.2855461 | Thread 28 is processing Job Id: 37
+++ ProcessJob 08:27:24.2880490 | Thread 33 is processing Job Id: 9
ReadDocument 08:27:24.2855461 | Thread 6 is processing Job Id: 56
ValidateXml 08:27:25.2853094 | Thread 19 is processing Job Id: 38
ReadDocument 08:27:25.2853094 | Thread 13 is processing Job Id: 58
+++ ProcessJob 08:27:25.2868091 | Thread 34 is processing Job Id: 13
ReadDocument 08:27:25.2858087 | Thread 16 is processing Job Id: 59
+++ ProcessJob 08:27:25.2858087 | Thread 25 is processing Job Id: 10
+++ ProcessJob 08:27:25.2858087 | Thread 29 is processing Job Id: 12
ReadDocument 08:27:25.2853094 | Thread 11 is processing Job Id: 57
ReadDocument 08:27:25.2873097 | Thread 15 is processing Job Id: 60
ValidateXml 08:27:25.2853094 | Thread 22 is processing Job Id: 40
ValidateXml 08:27:25.2853094 | Thread 23 is processing Job Id: 39
+++ ProcessJob 08:27:25.2858087 | Thread 30 is processing Job Id: 11
ValidateXml 08:27:26.2865381 | Thread 21 is processing Job Id: 41
ReadDocument 08:27:26.2865381 | Thread 14 is processing Job Id: 61
ValidateXml 08:27:26.2865381 | Thread 20 is processing Job Id: 42
ValidateXml 08:27:26.2865381 | Thread 26 is processing Job Id: 43
ReadDocument 08:27:26.2865381 | Thread 17 is processing Job Id: 62
ReadDocument 08:27:26.2870374 | Thread 12 is processing Job Id: 63
+++ ProcessJob 08:27:26.2870374 | Thread 24 is processing Job Id: 14

3 个答案:

答案 0 :(得分:5)

如果您将TransformBlockActionBlock相关联,则可以执行此操作。

使用可编辑的控制台应用程序最容易演示。

此应用程序处理整数序列,但您可以使用自定义工作单元类替换整数。

(我使用相对较慢的LZMA压缩算法从我编写的实用多线程文件压缩的​​实用程序中修改了此代码。该实用程序必须从文件中顺序读取输入数据,然后将其以块的形式传递给队列。使用任意顺序的多个线程处理数据,最后将压缩块输出到队列,该队列必须保留数据块的原始顺序。)

示例代码:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;

namespace Demo
{
    class Program
    {
        public static void Main()
        {
            var data = Enumerable.Range(1, 100);
            var task = Process(data);

            Console.WriteLine("Waiting for task to complete");
            task.Wait();
            Console.WriteLine("Task complete.");
        }

        public static async Task Process(IEnumerable<int> data)
        {
            var queue = new TransformBlock<int, int>(block => process(block), transformBlockOptions());
            var writer = new ActionBlock<int>(block => write(block), actionBlockOptions());

            queue.LinkTo(writer, new DataflowLinkOptions { PropagateCompletion = true });

            await enqueDataToProcessAndAwaitCompletion(data, queue);

            await writer.Completion;
        }

        static int process(int block)
        {
            Console.WriteLine($"Thread {Thread.CurrentThread.ManagedThreadId} is processing block {block}");
            emulateWorkload();
            return -block;
        }

        static void write(int block)
        {
            Console.WriteLine("Output: " + block);
        }

        static async Task enqueDataToProcessAndAwaitCompletion(IEnumerable<int> data, TransformBlock<int, int> queue)
        {
            await enqueueDataToProcess(data, queue);
            queue.Complete();
        }

        static async Task enqueueDataToProcess(IEnumerable<int> data, ITargetBlock<int> queue)
        {
            foreach (var item in data)
                await queue.SendAsync(item);
        }


        static ExecutionDataflowBlockOptions transformBlockOptions()
        {
            return new ExecutionDataflowBlockOptions
            {
                MaxDegreeOfParallelism = 8,
                BoundedCapacity = 32
            };
        }

        private static ExecutionDataflowBlockOptions actionBlockOptions()
        {
            return new ExecutionDataflowBlockOptions
            {
                MaxDegreeOfParallelism = 1,
                BoundedCapacity = 1
            };
        }

        static Random rng = new Random();
        static object locker = new object();

        static void emulateWorkload()
        {
            int delay;

            lock (locker)
            {
                delay = rng.Next(250, 750);
            }

            Thread.Sleep(delay);
        }
    }
}

输出:

Waiting for task to complete
Thread 8 is processing block 8
Thread 5 is processing block 2
Thread 6 is processing block 6
Thread 4 is processing block 5
Thread 7 is processing block 7
Thread 10 is processing block 4
Thread 9 is processing block 1
Thread 3 is processing block 3
Thread 3 is processing block 9
Thread 8 is processing block 10
Thread 5 is processing block 11
Thread 6 is processing block 12
Thread 9 is processing block 13
Thread 10 is processing block 14
Thread 7 is processing block 15
Thread 8 is processing block 16
Thread 4 is processing block 17
Thread 5 is processing block 18
Thread 3 is processing block 19
Thread 9 is processing block 20
Thread 8 is processing block 21
Output: -1
Output: -2
Output: -3
Output: -4
Output: -5
Output: -6
Output: -7
Output: -8
Output: -9
Output: -10
Output: -11
Output: -12
Output: -13
Thread 6 is processing block 22
Thread 10 is processing block 23
Output: -14
Thread 7 is processing block 24
Output: -15
Output: -16
Thread 6 is processing block 25
Output: -17
Thread 4 is processing block 26
Thread 5 is processing block 27
----------------->SNIP<-----------------
Thread 10 is processing block 93
Thread 8 is processing block 94
Output: -83
Thread 4 is processing block 95
Output: -84
Output: -85
Output: -86
Output: -87
Thread 3 is processing block 96
Output: -88
Thread 6 is processing block 97
Thread 5 is processing block 98
Thread 10 is processing block 99
Thread 9 is processing block 100
Output: -89
Output: -90
Output: -91
Output: -92
Output: -93
Output: -94
Output: -95
Output: -96
Output: -97
Output: -98
Output: -99
Output: -100
Task complete.
Press any key to continue . . .

注意&#34;阻止&#34;由多个线程以任何顺序处理,但输出顺序与输入顺序相同。

根据actionBlockOptions()方法设置输出操作块选项非常重要,MaxDegreeOfParallelismBoundedCapacity都设置为1。

这是导致输出按正确顺序序列化的原因。如果为输出设置BoundedCapacityMaxDegreeOfParallelism大于1,则可能以错误的顺序输出。

答案 1 :(得分:5)

@Matthew Watson有一个很好的建议我只想提出,除非你正在使用Microsoft.Tpl.Dataflow包,否则没有必要将MaxDegreeOfParallelism和BoundedCapacity的最终操作块限制为1。较新且正确的System.Threading.Tasks.Dataflow将属性EnsureOrdered添加到执行块选项。虽然这似乎没有在MSDN中记录,但您可以在TPL Dataflow Source中找到此属性及其用法。

以下是演示此行为的示例和测试,将执行选项中的EnsureOrdered更改为false将导致测试失败。默认值为true,不需要为有序行为显式设置。

修改 正如@Matthew Watson在下面指出的那样,当EnsureOrdered在Propagator Blocks之间保持秩序时,一旦在动作块中消息可以按任何顺序处理。

Edit2:注意:如果ActionBlock的MaxDegreeOfParllelismBoundedCapacity设置为1,但EnsureOrdered为false,则测试将失败并且结果将为发生故障。

[TestFixture]
public class TestRunner {

    [Test]
    public void TestPipeline() {
        var data = Enumerable.Range(0, 30).Select(x => new Message(x, x)).ToList();

        var target = new MyDataflow();
        target.PostData(data).Wait();

        Assert.IsTrue(data.SequenceEqual(target.OutputMessages));
    }
}

public class MyDataflow {

    private static Random rnd = new Random();

    private BufferBlock<Message> buffer;
    private TransformBlock<Message, Message> xForm1;
    private ActionBlock<Message> action;
    public IList<Message> OutputMessages { get; set; }

    public MyDataflow() {
        OutputMessages = new List<Message>();
        CreatePipeline();
        LinkPipeline();
    }

    public void CreatePipeline() {
        var options = new ExecutionDataflowBlockOptions() {
            BoundedCapacity = 2,
            MaxDegreeOfParallelism = 10,
            EnsureOrdered = true
        };

        buffer = new BufferBlock<Message>();

        xForm1 = new TransformBlock<Message, Message>(x => {
            Console.WriteLine($"{DateTime.Now.TimeOfDay} - Started Id: {x.Id}");
            Task.Delay(rnd.Next(1000, 3000)).Wait();
            Console.WriteLine($"{DateTime.Now.TimeOfDay} - Finished Id: {x.Id}");
            return x;
        }, options);

        action = new ActionBlock<Message>(x => {
            Console.WriteLine($"{DateTime.Now.TimeOfDay} - Output  Id: {x.Id} Value: {x.Value}");

            //this delay will cause the messages to be unordered
            Task.Delay(rnd.Next(1000, 3000)).Wait();

            OutputMessages.Add(x);
        }, options);
    }

    public void LinkPipeline() {
        var options = new DataflowLinkOptions() {
            PropagateCompletion = true
        };

        buffer.LinkTo(xForm1, options);
        xForm1.LinkTo(action, options);
    }

    public Task PostData(IEnumerable<Message> data) {

        foreach (var item in data) {
            buffer.Post(item);
        }
        buffer.Complete();
        return action.Completion;
    }
}

public class Message {
    public Message(int id, int value) {
        this.Id = id;
        this.Value = value;
    }
    public int Id { get; set; }
    public int Value { get; set; }
}

修改 很遗憾,我们无法直接访问内部ReorderingBuffer。因此,ActionBlock BoundedCapacityMaxDegreeOfParallelism等于1的替代方法是将TransformBlock有序输出链接到流。请注意,在上面的代码中,并行启用ActionBlock的延迟会导致结果无序,但在下面的代码中,流处理的延迟不会干扰顺序。基本上,提供与同步ActionBlock相同的行为,并可以提供另一部分网格等。

[TestFixture]
public class TestRunner {

    [Test]
    public void TestPipeline() {
        var data = Enumerable.Range(0, 30).Select(x => new Message(x, x)).ToList();

        var target = new MyDataflow();
        target.PostData(data).Wait();

        Assert.IsTrue(data.SequenceEqual(target.OutputMessages));
    }
}

public class MyDataflow {

    private static Random rnd = new Random();

    private BufferBlock<Message> buffer;
    private TransformBlock<Message, Message> xForm1;
    private IObservable<Message> output;
    private TaskCompletionSource<bool> areWeDoneYet;
    public IList<Message> OutputMessages { get; set; }

    public MyDataflow() {
        OutputMessages = new List<Message>();
        CreatePipeline();
        LinkPipeline();
    }

    public void CreatePipeline() {
        var options = new ExecutionDataflowBlockOptions() {
            BoundedCapacity = 13,
            MaxDegreeOfParallelism = 10,
            EnsureOrdered = true
        };

        buffer = new BufferBlock<Message>();

        xForm1 = new TransformBlock<Message, Message>(x => {
            Console.WriteLine($"{DateTime.Now.TimeOfDay} - Started Id: {x.Id}");
            Task.Delay(rnd.Next(1000, 3000)).Wait();
            Console.WriteLine($"{DateTime.Now.TimeOfDay} - Finished Id: {x.Id}");
            return x;
        }, options);

        output = xForm1.AsObservable<Message>();

        areWeDoneYet = new TaskCompletionSource<bool>();
    }

    public void LinkPipeline() {
        var options = new DataflowLinkOptions() {
            PropagateCompletion = true
        };

        buffer.LinkTo(xForm1, options);

        var subscription = output.Subscribe(msg => {
            Task.Delay(rnd.Next(1000, 3000)).Wait();
            OutputMessages.Add(msg);
        }, () => areWeDoneYet.SetResult(true));            
    }

    public Task<bool> PostData(IEnumerable<Message> data) {            
        foreach (var item in data) {
            buffer.Post(item);
        }
        buffer.Complete();
        return areWeDoneYet.Task;
    }
}

public class Message {
    public Message(int id, int value) {
        this.Id = id;
        this.Value = value;
    }
    public int Id { get; set; }
    public int Value { get; set; }
}

<强> EDIT2: 另外,我的管道应该有3个阶段,我怎么能链接那些?因此,当第一个块完成第一个文件时,它会开始将数据输出到下一个块,这将再次并行和异步。

这不是由它们如何链接而是在ExecutionDataflowBlockOptions中。使用下面显示的选项,第一个块将根据发布的文件数量及其给定的处理时间来完成任务,因为它们完成后将输出到下一个处理阶段或基于ActionBlock的故障处理在Job.ReturnCode谓词上,下一阶段也将如此。您还可以修改ActionBlock选项,以处理TransformBlocks的多个成功/失败。

var options = new ExecutionDataflowBlockOptions() {
            BoundedCapacity = 10,
            MaxDegreeOfParallelism = 10,
            EnsureOrdered = true
        };
var loadXml = new TransformBlock<Job, Job>(job => { ... }, options); // I/O
var validateData = new TransformBlock<Job, Job>(job => { ... }, options); // Parsing&Validating&Calculations
var importJob = new TransformBlock<Job, Job>(job => { ... }, options); // Saving to database

var loadingFailed = new ActionBlock<Job>(job => CreateResponse(job));
var validationFailed = new ActionBlock<Job>(job => CreateResponse(job));
var reportImport = new ActionBlock<Job>(job => CreateResponse(job));

loadXml.LinkTo(validateData, job => job.ReturnCode == 100);
loadXml.LinkTo(loadingFailed);

validateData.LinkTo(importJob, Job => Job.ReturnCode == 100);
validateData.LinkTo(validationFailed);

importJob.LinkTo(reportImport);

<强> EDIT3 响应OP增加的源代码: 通过将MaxDegreeOfParallelismBoundedCapcity设置为1,您丢失了上一个转化块中的有序行为。让我明确不要这样做以确保订单&#34; 它只与图书馆作战。以下是TransformBlock

的相关摘要
            // If parallelism is employed, we will need to support reordering messages that complete out-of-order.
            // However, a developer can override this with EnsureOrdered == false.
            if (dataflowBlockOptions.SupportsParallelExecution && dataflowBlockOptions.EnsureOrdered)
            {
                _reorderingBuffer = new ReorderingBuffer<TOutput>(this, (owningSource, message) => ((TransformBlock<TInput, TOutput>)owningSource)._source.AddMessage(message));
            }

这是一个包含20个数据点的运行,修改了代码以在最终的TBlock中使用并行性。修改为基本csv以在Excel中查看,即替换&#34; &#34; =&GT; &#34;&#34; :)

Function,TimeStamp/Inserted JobId,Other,Other,Other,Other,Other,Other,Other,JobId From functions
ReadDocument,04:54.0,|,Thread,6,is,processing,Job,Id:,1
ReadDocument,04:54.0,|,Thread,11,is,processing,Job,Id:,2
ReadDocument,04:56.0,|,Thread,13,is,processing,Job,Id:,3
ReadDocument,04:56.0,|,Thread,6,is,processing,Job,Id:,4
ReadDocument,04:57.0,|,Thread,11,is,processing,Job,Id:,5
ReadDocument,04:57.0,|,Thread,14,is,processing,Job,Id:,6
ReadDocument,04:58.0,|,Thread,15,is,processing,Job,Id:,7
ReadDocument,04:58.0,|,Thread,6,is,processing,Job,Id:,8
ReadDocument,04:59.0,|,Thread,11,is,processing,Job,Id:,9
ReadDocument,04:59.0,|,Thread,16,is,processing,Job,Id:,10
ReadDocument,05:00.0,|,Thread,17,is,processing,Job,Id:,12
ReadDocument,05:00.0,|,Thread,15,is,processing,Job,Id:,11
ReadDocument,05:01.0,|,Thread,16,is,processing,Job,Id:,13
ReadDocument,05:01.0,|,Thread,18,is,processing,Job,Id:,14
ReadDocument,05:02.0,|,Thread,15,is,processing,Job,Id:,15
ReadDocument,05:02.0,|,Thread,17,is,processing,Job,Id:,20
ValidateXml,05:02.0,|,Thread,19,is,processing,Job,Id:,1
ReadDocument,05:02.0,|,Thread,14,is,processing,Job,Id:,17
ReadDocument,05:02.0,|,Thread,13,is,processing,Job,Id:,16
ReadDocument,05:02.0,|,Thread,11,is,processing,Job,Id:,18
ReadDocument,05:02.0,|,Thread,6,is,processing,Job,Id:,19
ValidateXml,05:03.0,|,Thread,16,is,processing,Job,Id:,2
ValidateXml,05:03.0,|,Thread,20,is,processing,Job,Id:,3
ValidateXml,05:04.0,|,Thread,11,is,processing,Job,Id:,4
ValidateXml,05:04.0,|,Thread,21,is,processing,Job,Id:,7
ValidateXml,05:04.0,|,Thread,18,is,processing,Job,Id:,5
ValidateXml,05:04.0,|,Thread,15,is,processing,Job,Id:,6
ValidateXml,05:04.5,|,Thread,16,is,processing,Job,Id:,8
ValidateXml,05:04.5,|,Thread,6,is,processing,Job,Id:,9
ValidateXml,05:04.6,|,Thread,19,is,processing,Job,Id:,10
ProcessJob,05:04.6,|,Thread,14,is,processing,Job,Id:,2
ProcessJob,05:04.6,|,Thread,22,is,processing,Job,Id:,1
ValidateXml,05:05.5,|,Thread,18,is,processing,Job,Id:,11
ValidateXml,05:05.6,|,Thread,20,is,processing,Job,Id:,12
ProcessJob,05:05.6,|,Thread,23,is,processing,Job,Id:,3
ValidateXml,05:06.5,|,Thread,6,is,processing,Job,Id:,13
ValidateXml,05:06.5,|,Thread,21,is,processing,Job,Id:,15
ID,1,was,successfully,imported.,,,,,
ValidateXml,05:06.5,|,Thread,16,is,processing,Job,Id:,14
ValidateXml,05:06.8,|,Thread,15,is,processing,Job,Id:,17
ProcessJob,05:06.8,|,Thread,24,is,processing,Job,Id:,4
ValidateXml,05:06.8,|,Thread,11,is,processing,Job,Id:,16
ProcessJob,05:06.8,|,Thread,22,is,processing,Job,Id:,5
ProcessJob,05:07.5,|,Thread,17,is,processing,Job,Id:,6
ProcessJob,05:07.5,|,Thread,25,is,processing,Job,Id:,8
ValidateXml,05:07.5,|,Thread,19,is,processing,Job,Id:,18
ProcessJob,05:07.5,|,Thread,14,is,processing,Job,Id:,7
ValidateXml,05:08.5,|,Thread,16,is,processing,Job,Id:,19
ProcessJob,05:08.5,|,Thread,23,is,processing,Job,Id:,9
ValidateXml,05:08.5,|,Thread,18,is,processing,Job,Id:,20
ProcessJob,05:09.5,|,Thread,19,is,processing,Job,Id:,10
ID,2,was,successfully,imported.,,,,,
ProcessJob,05:09.5,|,Thread,15,is,processing,Job,Id:,11
ID,3,was,successfully,imported.,,,,,
ProcessJob,05:10.6,|,Thread,14,is,processing,Job,Id:,12
ProcessJob,05:10.9,|,Thread,25,is,processing,Job,Id:,13
ProcessJob,05:11.0,|,Thread,24,is,processing,Job,Id:,14
ID,4,was,successfully,imported.,,,,,
ProcessJob,05:11.1,|,Thread,17,is,processing,Job,Id:,15
ProcessJob,05:11.3,|,Thread,22,is,processing,Job,Id:,16
ID,5,was,successfully,imported.,,,,,
ID,6,was,successfully,imported.,,,,,
ID,7,was,successfully,imported.,,,,,
ID,8,was,successfully,imported.,,,,,
ProcessJob,05:11.6,|,Thread,19,is,processing,Job,Id:,17
ProcessJob,05:11.7,|,Thread,23,is,processing,Job,Id:,18
ID,9,was,successfully,imported.,,,,,
ID,10,was,successfully,imported.,,,,,
ProcessJob,05:12.0,|,Thread,14,is,processing,Job,Id:,19
ProcessJob,05:12.4,|,Thread,15,is,processing,Job,Id:,20
ID,11,was,successfully,imported.,,,,,
ID,12,was,successfully,imported.,,,,,
ID,13,was,successfully,imported.,,,,,
ID,14,was,successfully,imported.,,,,,
ID,15,was,successfully,imported.,,,,,
ID,16,was,successfully,imported.,,,,,
ID,17,was,successfully,imported.,,,,,
ID,18,was,successfully,imported.,,,,,
ID,19,was,successfully,imported.,,,,,
ID,20,was,successfully,imported.,,,,,

最后一点注意:返回bool表示成功并将异常映射到返回代码的函数可能有问题,但这不属于本问题的范围。通过在Stack Exchange Code Review

发布代码,您可以获得有关最佳做法的大量好建议

答案 2 :(得分:2)

原始答案机构变得太长了

Edit4:对OP Edit2的响应 我不确定产生提供的输出究竟做了哪些更改,但这里是您修改的源,结果显示所有100个输入的有序行为。

[user@machine]$ source activate molr-py3
CondaEnvironmentNotFoundError: Could not find environment: molr-py3 .
You can list all discoverable environments with `conda info --envs`.

<强>结果

using System;
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
using System.Xml;
using System.Linq;

namespace OrderProcessing {
    public class Job {
        public string Path { get; set; }

        public XmlDocument Document { get; set; }

        public List<Object> BusinessObjects { get; set; }

        public int ReturnCode { get; set; }

        public int ID { get; set; }
    }

    public class Test {
        ITargetBlock<Job> pathBlock = null;

        CancellationTokenSource cancellationTokenSource;

        Random rnd = new Random();

        private bool ReadDocument(Job job) {
            Console.WriteLine($"ReadDocument {DateTime.Now.TimeOfDay} JobId: {job.ID}");
            Task.Delay(rnd.Next(1000, 3000)).Wait();

            // Throw OperationCanceledException if cancellation is requested.
            cancellationTokenSource.Token.ThrowIfCancellationRequested();

            // Read the document
            job.Document = new XmlDocument();

            // Some checking
            return true;
        }

        private bool ValidateXml(Job job) {
            Console.WriteLine($"ValidateXml {DateTime.Now.TimeOfDay} JobId: {job.ID}");
            Task.Delay(rnd.Next(1000, 3000)).Wait();

            // Throw OperationCanceledException if cancellation is requested.
            cancellationTokenSource.Token.ThrowIfCancellationRequested();

            // Check XML against XSD and perform remaining checks
            job.BusinessObjects = new List<object>();

            // Just for tests
            job.BusinessObjects.Add(new object());
            job.BusinessObjects.Add(new object());

            // Parse Xml and create business objects
            return true;
        }

        private bool ProcessJob(Job job) {
            Console.WriteLine($"ProcessJob {DateTime.Now.TimeOfDay} JobId: {job.ID}");

            // Throw OperationCanceledException if cancellation is requested.
            cancellationTokenSource.Token.ThrowIfCancellationRequested();

            Parallel.ForEach(job.BusinessObjects, bO => {
                ImportObject(bO);
            });


            // Import the job
            return true;
        }

        private object ImportObject(object o) {
            Task.Delay(rnd.Next(1000, 3000)).Wait();

            return new object();
        }

        private void CreateResponse(Job job) {
            if (job.ReturnCode == 100) {
                Console.WriteLine($"CreateResponse {DateTime.Now.TimeOfDay} JobId: {job.ID}");

            }
            else {
                Console.WriteLine("ID {0} failed to import.", job.ID);
            }

            // Create response XML with returncodes
        }

        ITargetBlock<Job> CreateJobProcessingPipeline() {
            var loadXml = new TransformBlock<Job, Job>(job => {
                try {
                    if (ReadDocument(job)) {
                        // For later error handling
                        job.ReturnCode = 100; // success
                    }
                    else {
                        job.ReturnCode = 200;
                    }

                    return job;
                }
                catch (OperationCanceledException) {
                    job.ReturnCode = 300;
                    return job;
                }
            }, TransformBlockOptions());

            var validateXml = new TransformBlock<Job, Job>(job => {
                try {
                    if (ValidateXml(job)) {
                        // For later error handling
                        job.ReturnCode = 100;
                    }
                    else {
                        job.ReturnCode = 200;
                    }

                    return job;
                }
                catch (OperationCanceledException) {
                    job.ReturnCode = 300;
                    return job;
                }
            }, TransformBlockOptions());


            var importJob = new TransformBlock<Job, Job>(job => {
                try {
                    if (ProcessJob(job)) {
                        // For later error handling
                        job.ReturnCode = 100; // success
                    }
                    else {
                        job.ReturnCode = 200;
                    }

                    return job;
                }
                catch (OperationCanceledException) {
                    job.ReturnCode = 300;
                    return job;
                }
            }, TransformBlockOptions());

            var loadingFailed = new ActionBlock<Job>(job => CreateResponse(job), ActionBlockOptions());
            var validationFailed = new ActionBlock<Job>(job => CreateResponse(job), ActionBlockOptions());
            var reportImport = new ActionBlock<Job>(job => CreateResponse(job), ActionBlockOptions());

            //
            // Connect the pipeline
            //
            loadXml.LinkTo(validateXml, job => job.ReturnCode == 100);
            loadXml.LinkTo(loadingFailed);

            validateXml.LinkTo(importJob, Job => Job.ReturnCode == 100);
            validateXml.LinkTo(validationFailed);

            //importJob.LinkTo(reportImport);

            var output = importJob.AsObservable();
            var subscription = output.Subscribe(x => {
            if (x.ReturnCode == 100) {
                //job success
                Console.WriteLine($"SendToDataBase {DateTime.Now.TimeOfDay} JobId: {x.ID}");
            }
            else {
                //handle fault
                Console.WriteLine($"Job Failed {DateTime.Now.TimeOfDay} JobId: {x.ID}");
            }                
        });

            // Return the head of the network.
            return loadXml;
        }

        public void Start() {
            cancellationTokenSource = new CancellationTokenSource();

            pathBlock = CreateJobProcessingPipeline();
        }

        public async void AddJob(string path, int id) {
            Job j = new Job();
            j.Path = path;
            j.ID = id;

            await pathBlock.SendAsync(j);
        }

        static ExecutionDataflowBlockOptions TransformBlockOptions() {
            return new ExecutionDataflowBlockOptions {
                MaxDegreeOfParallelism = 8,
                BoundedCapacity = 32
            };
        }

        private static ExecutionDataflowBlockOptions ActionBlockOptions() {
            return new ExecutionDataflowBlockOptions {
                MaxDegreeOfParallelism = 1,
                BoundedCapacity = 1
            };
        }

        public void Cancel() {
            if (cancellationTokenSource != null)
                cancellationTokenSource.Cancel();
        }
    }

    class Program {
        private static String InputXml = @"C:\XML\Part.xml";
        private static Test _Pipeline;

        static void Main(string[] args) {
            _Pipeline = new Test();
            _Pipeline.Start();


            var data = Enumerable.Range(1, 100);

            foreach (var d in data)
                _Pipeline.AddJob(InputXml, d);

            //Wait before closing the application so we can see the results.
            Console.ReadLine();
        }
    }
}

修改 新订阅会将您的商品发送到Db或以选择的方式处理出现故障的作业。

更多资源:

Stack Exchange Code Review

Dataflow Source