TPL Dataflow对核心设计感到困惑

时间:2012-12-12 07:21:04

标签: c# concurrency task-parallel-library c#-5.0 tpl-dataflow

我一直在使用TPL Dataflow,但我对我无法解决的问题感到磕磕绊绊:

我有以下架构:

BroadCastBlock<List<object1>> - &gt; 2个不同的TransformBlock<List<Object1>, Tuple<int, List<Object1>>> - &gt;两者都链接到TransformManyBlock<Tuple<int, List<Object1>>, Object2>

我改变了链末尾的TransformManyBlock中的lambda表达式:(a)对流式元组执行操作的代码,(b)根本没有代码。

在TransformBlocks中,我测量从第一个项目到达开始的时间,并在TransformBlock.Completion指示块完成时停止(broadCastBlock链接到传播完成设置为true的transfrom块)。

我无法调和的是为什么(b)情况下的transformBlocks比(a)快5-6倍。这完全违背了整个TDF设计意图的意图。变换块中的项目被传递给transfromManyBlock,因此,根据transformManyBlock对变换块完成时影响的项目所做的事情应该无关紧要。我没有看到为什么transfromManyBlock中发生的任何事情可能与前面的TransformBlocks有关的一个原因。

任何可以调和这种奇怪观察的人?

以下是一些显示差异的代码。运行代码时,请确保更改以下两行:

        tfb1.transformBlock.LinkTo(transformManyBlock);
        tfb2.transformBlock.LinkTo(transformManyBlock);

为:

        tfb1.transformBlock.LinkTo(transformManyBlockEmpty);
        tfb2.transformBlock.LinkTo(transformManyBlockEmpty);

以便观察前面transformBlocks的运行时差异。

class Program
{
    static void Main(string[] args)
    {
        Test test = new Test();
        test.Start();
    }
}

class Test
{
    private const int numberTransformBlocks = 2;
    private int currentGridPointer;
    private Dictionary<int, List<Tuple<int, List<Object1>>>> grid;

    private BroadcastBlock<List<Object1>> broadCastBlock;
    private TransformBlockClass tfb1;
    private TransformBlockClass tfb2;

    private TransformManyBlock<Tuple<int, List<Object1>>, Object2> 
               transformManyBlock;
    private TransformManyBlock<Tuple<int, List<Object1>>, Object2> 
               transformManyBlockEmpty;
    private ActionBlock<Object2> actionBlock;

    public Test()
    {
        grid = new Dictionary<int, List<Tuple<int, List<Object1>>>>();

        broadCastBlock = new BroadcastBlock<List<Object1>>(list => list);

        tfb1 = new TransformBlockClass();
        tfb2 = new TransformBlockClass();

        transformManyBlock = new TransformManyBlock<Tuple<int, List<Object1>>, Object2>
                (newTuple =>
            {
                for (int counter = 1; counter <= 10000000;  counter++)
                {
                    double result = Math.Sqrt(counter + 1.0);
                }

                return new Object2[0];

            });

        transformManyBlockEmpty 
            = new TransformManyBlock<Tuple<int, List<Object1>>, Object2>(
                  tuple =>
            {
                return new Object2[0];
            });

        actionBlock = new ActionBlock<Object2>(list =>
            {
                int tester = 1;
                //flush transformManyBlock
            });

        //linking
        broadCastBlock.LinkTo(tfb1.transformBlock
                              , new DataflowLinkOptions 
                                  { PropagateCompletion = true }
                              );
        broadCastBlock.LinkTo(tfb2.transformBlock
                              , new DataflowLinkOptions 
                                  { PropagateCompletion = true }
                              );

        //link either to ->transformManyBlock or -> transformManyBlockEmpty
        tfb1.transformBlock.LinkTo(transformManyBlock);
        tfb2.transformBlock.LinkTo(transformManyBlock);

        transformManyBlock.LinkTo(actionBlock
                                  , new DataflowLinkOptions 
                                       { PropagateCompletion = true }
                                  );
        transformManyBlockEmpty.LinkTo(actionBlock
                                       , new DataflowLinkOptions 
                                            { PropagateCompletion = true }
                                       );

        //completion
        Task.WhenAll(tfb1.transformBlock.Completion
                     , tfb2.transformBlock.Completion)
                       .ContinueWith(_ =>
            {
                transformManyBlockEmpty.Complete();
                transformManyBlock.Complete();
            });

        transformManyBlock.Completion.ContinueWith(_ =>
            {
                Console.WriteLine("TransformManyBlock (with code) completed");
            });

        transformManyBlockEmpty.Completion.ContinueWith(_ =>
        {
            Console.WriteLine("TransformManyBlock (empty) completed");
        });

    }

    public void Start()
    {
        const int numberBlocks = 100;
        const int collectionSize = 300000;


        //send collection numberBlock-times
        for (int i = 0; i < numberBlocks; i++)
        {
            List<Object1> list = new List<Object1>();
            for (int j = 0; j < collectionSize; j++)
            {
                list.Add(new Object1(j));
            }

            broadCastBlock.Post(list);
        }

        //mark broadCastBlock complete
        broadCastBlock.Complete();

        Console.WriteLine("Core routine finished");
        Console.ReadLine();
    }
}

class TransformBlockClass
{
    private Stopwatch watch;
    private bool isStarted;
    private int currentIndex;

    public TransformBlock<List<Object1>, Tuple<int, List<Object1>>> transformBlock;

    public TransformBlockClass()
    {
        isStarted = false;
        watch = new Stopwatch();

        transformBlock = new TransformBlock<List<Object1>, Tuple<int, List<Object1>>>
           (list =>
        {
            if (!isStarted)
            {
                StartUp();
                isStarted = true;
            }

            return new Tuple<int, List<Object1>>(currentIndex++, list);
        });

        transformBlock.Completion.ContinueWith(_ =>
            {
                ShutDown();
            });
    }

    private void StartUp()
    {
        watch.Start();
    }

    private void ShutDown()
    {
        watch.Stop();

        Console.WriteLine("TransformBlock : Time elapsed in ms: " 
                              + watch.ElapsedMilliseconds);
    }
}

class Object1
{
    public int val { get; private set; }

    public Object1(int val)
    {
        this.val = val;
    }
}

class Object2
{
    public int value { get; private set; }
    public List<Object1> collection { get; private set; }

    public Object2(int value, List<Object1> collection)
    {
        this.value = value;
        this.collection = collection;
    }    
}

* 编辑:我发布了另一个代码片段,这次使用了值类型的集合,我无法重现我在上面的代码中观察到的问题。传递引用类型并同时对它们进行操作(即使在不同的数据流块中)可能会阻塞并导致争用吗? *

class Program
{
    static void Main(string[] args)
    {
        Test test = new Test();
        test.Start();
    }
}

class Test
{
    private BroadcastBlock<List<int>> broadCastBlock;
    private TransformBlock<List<int>, List<int>> tfb11;
    private TransformBlock<List<int>, List<int>> tfb12;
    private TransformBlock<List<int>, List<int>> tfb21;
    private TransformBlock<List<int>, List<int>> tfb22;
    private TransformManyBlock<List<int>, List<int>> transformManyBlock1;
    private TransformManyBlock<List<int>, List<int>> transformManyBlock2;
    private ActionBlock<List<int>> actionBlock1;
    private ActionBlock<List<int>> actionBlock2;

    public Test()
    {
        broadCastBlock = new BroadcastBlock<List<int>>(item => item);

        tfb11 = new TransformBlock<List<int>, List<int>>(item =>
            {
                return item;
            });

        tfb12 = new TransformBlock<List<int>, List<int>>(item =>
            {
                return item;
            });

        tfb21 = new TransformBlock<List<int>, List<int>>(item =>
            {
                return item;
            });

        tfb22 = new TransformBlock<List<int>, List<int>>(item =>
            {
                return item;
            });

        transformManyBlock1 = new TransformManyBlock<List<int>, List<int>>(item =>
            {
                Thread.Sleep(100);
                //or you can replace the Thread.Sleep(100) with actual work, 
                //no difference in results. This shows that the issue at hand is 
                //unrelated to starvation of threads.

                return new List<int>[1] { item };
            });

        transformManyBlock2 = new TransformManyBlock<List<int>, List<int>>(item =>
            {
                return new List<int>[1] { item };
            });

        actionBlock1 = new ActionBlock<List<int>>(item =>
            {
                //flush transformManyBlock
            });

        actionBlock2 = new ActionBlock<List<int>>(item =>
        {
            //flush transformManyBlock
        });

        //linking
        broadCastBlock.LinkTo(tfb11, new DataflowLinkOptions 
                                      { PropagateCompletion = true });
        broadCastBlock.LinkTo(tfb12, new DataflowLinkOptions 
                                      { PropagateCompletion = true });
        broadCastBlock.LinkTo(tfb21, new DataflowLinkOptions 
                                      { PropagateCompletion = true });
        broadCastBlock.LinkTo(tfb22, new DataflowLinkOptions 
                                      { PropagateCompletion = true });

        tfb11.LinkTo(transformManyBlock1);
        tfb12.LinkTo(transformManyBlock1);
        tfb21.LinkTo(transformManyBlock2);
        tfb22.LinkTo(transformManyBlock2);

        transformManyBlock1.LinkTo(actionBlock1
                                   , new DataflowLinkOptions 
                                     { PropagateCompletion = true }
                                   );
        transformManyBlock2.LinkTo(actionBlock2
                                   , new DataflowLinkOptions 
                                     { PropagateCompletion = true }
                                   );

        //completion
        Task.WhenAll(tfb11.Completion, tfb12.Completion).ContinueWith(_ =>
            {
                Console.WriteLine("TransformBlocks 11 and 12 completed");
                transformManyBlock1.Complete();
            });

        Task.WhenAll(tfb21.Completion, tfb22.Completion).ContinueWith(_ =>
            {
                Console.WriteLine("TransformBlocks 21 and 22 completed");
                transformManyBlock2.Complete();
            });

        transformManyBlock1.Completion.ContinueWith(_ =>
            {
                Console.WriteLine
                    ("TransformManyBlock (from tfb11 and tfb12) finished");
            });

        transformManyBlock2.Completion.ContinueWith(_ =>
            {
                Console.WriteLine
                    ("TransformManyBlock (from tfb21 and tfb22) finished");
            });
    }

    public void Start()
    {
        const int numberBlocks = 100;
        const int collectionSize = 300000;

        //send collection numberBlock-times
        for (int i = 0; i < numberBlocks; i++)
        {
            List<int> list = new List<int>();
            for (int j = 0; j < collectionSize; j++)
            {
                list.Add(j);
            }

            broadCastBlock.Post(list);
        }

        //mark broadCastBlock complete
        broadCastBlock.Complete();

        Console.WriteLine("Core routine finished");
        Console.ReadLine();
    }
}

1 个答案:

答案 0 :(得分:3)

好的,最后的尝试; - )

梗概:

方案1中观察到的时间增量可以通过 垃圾收集器的不同行为来完全解释。

当运行链接transformManyBlocks的场景1时,运行时行为使得在主线程上创建新项目(列表)期间触发垃圾收集,而在运行场景1并且链接了transformManyBlockEmptys时则不是这种情况。 / p>

请注意,创建新的引用类型实例(Object1)会导致调用GC堆中的内存,从而可能会触发GC集合运行。由于创建了很多Object1实例(和列表),因此垃圾收集器可以为(可能)无法访问的对象扫描堆进行更多的工作。

因此,可以通过以下任何方式最小化观察到的差异:

  • 将Object1从类转换为struct(从而确保实例的内存不在堆上分配)。
  • 保持对生成的列表的引用(从而减少垃圾收集器识别无法访问的对象所需的时间)。
  • 在将所有项目发布到网络之前生成所有项目。

(注意:我无法解释为什么垃圾收集器在场景1&#34; transformManyBlock&#34;与场景1&#34; transformManyBlockEmpty&#34;中的行为不同,但通过ConcurrencyVisualizer收集的数据清楚地显示了差异。 )

结果:

(测试在Core i7 980X上运行,6核,HT启用):

我修改了方案2如下:

// Start a stopwatch per tfb
int tfb11Cnt = 0;
Stopwatch sw11 = new Stopwatch();
tfb11 = new TransformBlock<List<int>, List<int>>(item =>
{
    if (Interlocked.CompareExchange(ref tfb11Cnt, 1, 0) == 0)
        sw11.Start();

    return item;
});

// [...]

// completion
Task.WhenAll(tfb11.Completion, tfb12.Completion).ContinueWith(_ =>
{

     Console.WriteLine("TransformBlocks 11 and 12 completed. SW11: {0}, SW12: {1}",
     sw11.ElapsedMilliseconds, sw12.ElapsedMilliseconds);
     transformManyBlock1.Complete();
});

结果:

  1. 场景1(已发布,即链接到transformManyBlock)
     TransformBlock:以ms为单位的时间:6826
     TransformBlock:以ms为单位的时间:6826
  2. 场景1(链接到transformManyBlockEmpty)
     TransformBlock:以ms为单位的时间:3140
     TransformBlock:以ms为单位的时间:3140
  3. 场景1(循环体中的transformManyBlock,Thread.Sleep(200))
     TransformBlock:以ms为单位的时间:4949
     TransformBlock:以ms为单位的时间:4950
  4. 场景2(已发布但已修改为报告时间)
     TransformBlocks 21和22完成了。 SW21:619毫秒,SW22:669毫秒
     TransformBlocks 11和12完成了。 SW11:669毫秒,SW12:667毫秒
  5. 接下来,我更改了方案1和2,以便在将输入数据发布到网络之前准备输入数据:

    // Scenario 1
    //send collection numberBlock-times
    var input = new List<List<Object1>>(numberBlocks);
    for (int i = 0; i < numberBlocks; i++)
    {
        var list = new List<Object1>(collectionSize);
        for (int j = 0; j < collectionSize; j++)
        {
            list.Add(new Object1(j));
        }
        input.Add(list);
    }
    
    foreach (var inp in input)
    {
        broadCastBlock.Post(inp);
        Thread.Sleep(10);
    }
    
    // Scenario 2
    //send collection numberBlock-times
    var input = new List<List<int>>(numberBlocks);
    for (int i = 0; i < numberBlocks; i++)
    {
        List<int> list = new List<int>(collectionSize);
        for (int j = 0; j < collectionSize; j++)
        {
            list.Add(j);
        }
    
        //broadCastBlock.Post(list);
        input.Add(list);
     }
    
     foreach (var inp in input)
     {
         broadCastBlock.Post(inp);
         Thread.Sleep(10);
     }
    

    结果:

    1. 场景1(transformManyBlock)
       TransformBlock:以ms为单位的时间:1029
       TransformBlock:ms的经过时间:1029
    2. 场景1(transformManyBlockEmpty)
       TransformBlock:以ms为单位的时间:975
       TransformBlock:以ms为单位的时间:975
    3. 场景1(循环体中的transformManyBlock,Thread.Sleep(200))
       TransformBlock:以ms为单位的经过时间:972
       TransformBlock:以ms为单位的时间:972
    4. 最后,我将代码更改回原始版本,但保留了对引用的引用 创建列表:

      var lists = new List<List<Object1>>();
      for (int i = 0; i < numberBlocks; i++)
      {
          List<Object1> list = new List<Object1>();
          for (int j = 0; j < collectionSize; j++)
          {
              list.Add(new Object1(j));
          }
          lists.Add(list);                
          broadCastBlock.Post(list);
      }
      

      结果:

      1. 场景1(transformManyBlock)
         TransformBlock:以ms为单位的时间:6052
         TransformBlock:以ms为单位的时间:6052
      2. 场景1(transformManyBlockEmpty)
         TransformBlock:以ms为单位的时间:5524
         TransformBlock:以ms为单位的时间:5524
      3. 场景1(循环体中的transformManyBlock,Thread.Sleep(200))
         TransformBlock:以ms为单位的时间:5098
         TransformBlock:以ms为单位的时间:5098
      4. 同样,将Object1从类更改为struct会导致两个块几乎同时完成(并且大约快10倍)。


        更新:以下答案不足以解释观察到的行为。

        在方案一中,在TransformMany lambda内部执行紧密循环,这会占用CPU并且会使其他线程占用处理器资源。这就是为什么可以观察到完成继续任务的执行延迟的原因。在场景二中,在TransformMany lambda内部执行Thread.Sleep,使其他线程有机会执行Completion continuation任务。观察到的运行时行为差异与TPL数据流无关。为了改善观察到的增量,应该在场景1中的循环体内引入一个Thread.Sleep:

        for (int counter = 1; counter <= 10000000;  counter++)
        {
           double result = Math.Sqrt(counter + 1.0);
           // Back off for a little while
           Thread.Sleep(200);
        }
        

        (以下是我的原始答案。我没有仔细阅读OP的问题,只是在阅读了他的评论之后才明白他的问题。我仍然把它留在这里参考。)

        你确定你正在测量正确的东西吗?请注意,当您执行以下操作时:transformBlock.Completion.ContinueWith(_ => ShutDown());,那么您的时间测量将受到TaskScheduler行为的影响(例如,在继续任务开始执行之前需要多长时间)。虽然我无法观察到你在我的机器上看到的差异,但是当使用专用线程测量时间时,我得到了精确结果(就tfb1和tfb2完成时间之间的差值而言):

               // Within your Test.Start() method...
               Thread timewatch = new Thread(() =>
               {
                   var sw = Stopwatch.StartNew();
                   tfb1.transformBlock.Completion.Wait();
                   Console.WriteLine("tfb1.transformBlock completed within {0} ms",
                                      sw.ElapsedMilliseconds);
                });
        
                Thread timewatchempty = new Thread(() =>
                {
                    var sw = Stopwatch.StartNew();
                    tfb2.transformBlock.Completion.Wait();
                    Console.WriteLine("tfb2.transformBlock completed within {0} ms", 
                                       sw.ElapsedMilliseconds);
                });
        
                timewatch.Start();
                timewatchempty.Start();
        
                //send collection numberBlock-times
                for (int i = 0; i < numberBlocks; i++)
                {
                  // ... rest of the code