使用TPL的生产者消费者模型,.net 4.0中的任务

时间:2011-10-17 02:29:33

标签: c# multithreading .net-4.0 task-parallel-library

我有一个相当大的XML文件(大约1-2GB)。

要求是将xml数据保存到数据库中。 目前,这是通过3个步骤实现的。

  1. 尽可能少阅读内存占用较少的大文件
  2. 从xml-data
  3. 创建实体
  4. 使用SqlBulkCopy将创建的实体中的数据存储到数据库中。
  5. 为了获得更好的性能,我想创建一个Producer-consumer模型,其中生产者创建一组实体,例如批量为10K并将其添加到Queue中。消费者应该从队列中获取批量实体并使用sqlbulkcopy持久保存到数据库。

    谢谢, 戈库尔

    void Main()
    {
        int iCount = 0;
        string fileName = @"C:\Data\CatalogIndex.xml";
    
        DateTime startTime = DateTime.Now;
        Console.WriteLine("Start Time: {0}", startTime);
        FileInfo fi = new FileInfo(fileName);
        Console.WriteLine("File Size:{0} MB", fi.Length / 1048576.0);
    
    /* I want to change this loop to create a producer consumer pattern here to process the data parallel-ly
    */
         foreach (var element in StreamElements(fileName,"title"))
                {
                    iCount++;
                }
    
                Console.WriteLine("Count: {0}", iCount);
                Console.WriteLine("End Time: {0}, Time Taken:{1}", DateTime.Now, DateTime.Now - startTime);
            }
    
        private static IEnumerable<XElement> StreamElements(string fileName, string elementName)
        { 
            using (var rdr = XmlReader.Create(fileName))
            {
                rdr.MoveToContent();
                while (!rdr.EOF)
                {
                    if ((rdr.NodeType == XmlNodeType.Element) && (rdr.Name == elementName))
                    {
                        var e = XElement.ReadFrom(rdr) as XElement;
                        yield return e;
                    }
                    else
                    {
                        rdr.Read();
                    }
                }
                rdr.Close();
            }
        }
    

1 个答案:

答案 0 :(得分:4)

这是你想要做的吗?

    void Main()
    {
        const int inputCollectionBufferSize = 1024;
        const int bulkInsertBufferCapacity = 100;
        const int bulkInsertConcurrency = 4;

        BlockingCollection<object> inputCollection = new BlockingCollection<object>(inputCollectionBufferSize);

        Task loadTask = Task.Factory.StartNew(() =>
        {
            foreach (object nextItem in ReadAllElements(...))
            {
                // this will potentially block if there are already enough items
                inputCollection.Add(nextItem);
            }

            // mark this collection as done
            inputCollection.CompleteAdding();
        });

        Action parseAction = () =>
        {
            List<object> bulkInsertBuffer = new List<object>(bulkInsertBufferCapacity);

            foreach (object nextItem in inputCollection.GetConsumingEnumerable())
            {
                if (bulkInsertBuffer.Length == bulkInsertBufferCapacity)
                {
                    CommitBuffer(bulkInsertBuffer);
                    bulkInsertBuffer.Clear();
                }

                bulkInsertBuffer.Add(nextItem);
            }
        };

        List<Task> parseTasks = new List<Task>(bulkInsertConcurrency);

        for (int i = 0; i < bulkInsertConcurrency; i++)
        {
            parseTasks.Add(Task.Factory.StartNew(parseAction));
        }

        // wait before exiting
        loadTask.Wait();
        Task.WaitAll(parseTasks.ToArray());
    }