将ThreadPool.QueueUserWorkItem重新分解为任务并行库

时间:2012-06-19 21:09:15

标签: .net task-parallel-library azure-table-storage

尝试将ThreadPool.QueueUserWorkItem重新计算为TPL并且说实话我还不太了解(还)。

在现实生活中,这将是一个I / O操作。在PartitionKey上查询Azure表存储,以便它可以有效并行。

请参阅方法GetAllTPL(),因为它有我编写的简单TPL。它通过了这个非常简单的测试用例。 TPL是错的,我只是不知道吗?我能做得更好吗?当它是一个真正的Table查询时,任何可能使这个简单测试用例失败的东西都会失败?

ThreadPool.QueueUserWorkItem和TPL都为这个非常有限的测试用例提供了相同的正确答案。将秒表放在更大(但仍然简化)的测试用例上,TPL只是ThreadPool.QueueUserWorkItem的两倍。 TPL似乎以3个为一组排队,而ThreadPool.QueueUserWorkItem则以2个为一组,仅在具有超线程的P4上运行。因为thread.sleep不是真正的工作,并没有告诉我很多。

ThreadPool.QueueUserWorkItem来自.NET 3.5代码示例(TPL之前)。我很确定TPL有更好的选择,但我是TPL的新手。感谢

using System.Threading;
using System.Threading.Tasks;

namespace WaitForConsole
{
    class Program
    {
        static void Main(string[] args)
        {
            string[] partitionKey = new string[] { "one", "two", "three", "four" };
            IEnumerable<string> commonRowKeys = GetAllQWI(partitionKey);          
            foreach (string commonRowKey in commonRowKeys) Console.WriteLine(commonRowKey);
            Console.ReadLine();
            commonRowKeys = GetAllTPL(partitionKey);
            foreach (string commonRowKey in commonRowKeys) Console.WriteLine(commonRowKey);
            Console.ReadLine();
        }

        public static IEnumerable<string> GetAllQWI(string[] partitionKey)
        {
            // this a a code sample from .NET 3.5 and it runs on 4.0
            IEnumerable<string> finalResults = null;
            ManualResetEvent[] resetEvents = new ManualResetEvent[partitionKey.Length];
            HashSet<string>[] rowKeys = new HashSet<string>[partitionKey.Length];
            for (int i = 0; i < rowKeys.Length; i++)
            {
                resetEvents[i] = new ManualResetEvent(false);
                ThreadPool.QueueUserWorkItem(new WaitCallback((object index) =>
                {
                    Console.WriteLine("GetAllQWI " + ((int)index).ToString());
                    rowKeys[(int)index] = TableQueryGetRowKeys(partitionKey[(int)index]);
                    resetEvents[(int)index].Set();
                }), i);
            }
            try
            {
                WaitHandle.WaitAll(resetEvents);
                Console.WriteLine("WaitAll done");
                finalResults = (IEnumerable<string>)rowKeys[0];
                foreach (var thisRowKeys in rowKeys)
                {
                    finalResults = finalResults.Intersect(thisRowKeys);
                }

            }
            catch (Exception ex)
            {
                Console.WriteLine("WaitAll ex " + ex.Message);
            }
            return finalResults;
        }

        public static IEnumerable<string> GetAllTPL(string[] partitionKey)
        {
            // this is the conversion of the ThreadPool.QueueUserWorkItem to TPL 
            // seems to be working but is this optimal
            IEnumerable<string> finalResults = null;
            HashSet<string>[] rowKeys = new HashSet<string>[partitionKey.Length];
            //  how to do this in TPL
            Parallel.For(0, partitionKey.Length, i =>
            {
                Console.WriteLine("GetAllTPL " + i.ToString());
                rowKeys[i] = TableQueryGetRowKeys(partitionKey[i]);
            }); 
            //  Do I need to do anything special to wait for all the tasks to finish?
            //  Interesting that i is not necessarily in order but it does not need to be
            finalResults = (IEnumerable<string>)rowKeys[0];
            foreach (var thisRowKeys in rowKeys)
            {
                finalResults = finalResults.Intersect(thisRowKeys);
            }
            return finalResults;
        }

        public static HashSet<string> TableQueryGetRowKeys(string partitionKey)
        {
            // in real life this is an Azure table query to get all rowKeys for a partitionKey
            Thread.Sleep(10000);
            if (DateTime.Now.Millisecond % 2 == 0)
            {
                return new HashSet<string> { "alph", "beta", "gamma", "delta" };
            }
            else
            {
               return new HashSet<string> { "beta", "gamma", "delta", "epsilon" };
            }
        }
    }
}

花了几个小时来实现这一目标。如果TPL会这样做,那么我将学习更多有关TPL的知识,而不是学习ThreadPool.QueueUserWorkItem。有点让我觉得TPL太简单了。只是想检查一下我没有遗漏什么。我从计算密集型样本中获得了这个TPL。

有关TPL和Azure的不规则结果的SO帖子,但最终无法重现。 Can Parallel.ForEach be used safely with CloudTableQuery

Azure团队的另一篇文章指出TPL和Azure没问题Windows Azure: Parallelization of the code

0 个答案:

没有答案