批量插入关于它的最佳方式? +帮助我完全理解到目前为止我发现的内容

时间:2010-05-19 22:55:30

标签: c# .net sql-server performance linq-to-sql

所以我在这里看到这篇文章并阅读它,似乎批量复制可能是要走的路。

What’s the best way to bulk database inserts from c#?

我仍然有一些问题,想知道事情是如何运作的。

所以我找到了2个教程。

http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx#_Toc196622241

http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx

第一种方式使用2个ado.net 2.0功能。 BulkInsert和BulkCopy。第二个使用linq到sql和OpenXML。

这种吸引我,因为我已经使用linq到sql并且更喜欢它而不是ado.net。然而,正如一个人在帖子中指出他只是以牺牲表现为代价来解决这个问题(在我看来,这并没有错)

首先,我将讨论第一个教程中的两种方法

我正在使用VS2010 Express(用于测试我使用VS2008的教程,不知道我刚刚加载了.net版本的示例文件并运行它们),。net 4.0,MVC 2.0,SQl Server 2005

  1. ado.net 2.0是最新版本吗?
  2. 根据我使用的技术,是否会对我要展示的内容进行更新,以某种方式改进它?
  3. 这些教程遗漏了我应该知道的事情吗?
  4. BulkInsert

    我将这个表用于所有示例。

    CREATE TABLE [dbo].[TBL_TEST_TEST]
    (
        ID INT IDENTITY(1,1) PRIMARY KEY,
        [NAME] [varchar](50) 
    )
    

    SP代码

    USE [Test]
    GO
    /****** Object:  StoredProcedure [dbo].[sp_BatchInsert]    Script Date: 05/19/2010 15:12:47 ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    ALTER PROCEDURE [dbo].[sp_BatchInsert] (@Name VARCHAR(50) )
    AS
    BEGIN
                INSERT INTO TBL_TEST_TEST VALUES (@Name);
    END 
    

    C#代码

    /// <summary>
    /// Another ado.net 2.0 way that uses a stored procedure to do a bulk insert.
    /// Seems slower then "BatchBulkCopy" way and it crashes when you try to insert 500,000 records in one go.
    /// http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx#_Toc196622241
    /// </summary>
    private static void BatchInsert()
    {
        // Get the DataTable with Rows State as RowState.Added
        DataTable dtInsertRows = GetDataTable();
    
        SqlConnection connection = new SqlConnection(connectionString);
        SqlCommand command = new SqlCommand("sp_BatchInsert", connection);
        command.CommandType = CommandType.StoredProcedure;
        command.UpdatedRowSource = UpdateRowSource.None;
    
        // Set the Parameter with appropriate Source Column Name
        command.Parameters.Add("@Name", SqlDbType.VarChar, 50, dtInsertRows.Columns[0].ColumnName);
    
        SqlDataAdapter adpt = new SqlDataAdapter();
        adpt.InsertCommand = command;
        // Specify the number of records to be Inserted/Updated in one go. Default is 1.
        adpt.UpdateBatchSize = 1000;
    
        connection.Open();
        int recordsInserted = adpt.Update(dtInsertRows);
        connection.Close();
    }
    

    首先是批量大小。为什么要将批量大小设置为除了要发送的记录数之外的任何内容?就像我发送500,000条记录,所以我的批量大小为500,000。

    接下来为什么我这样做会崩溃?如果我将批量大小设置为1000,它就可以正常工作。

    System.Data.SqlClient.SqlException was unhandled
      Message="A transport-level error has occurred when sending the request to the server. (provider: Shared Memory Provider, error: 0 - No process is on the other end of the pipe.)"
      Source=".Net SqlClient Data Provider"
      ErrorCode=-2146232060
      Class=20
      LineNumber=0
      Number=233
      Server=""
      State=0
      StackTrace:
           at System.Data.Common.DbDataAdapter.UpdatedRowStatusErrors(RowUpdatedEventArgs rowUpdatedEvent, BatchCommandInfo[] batchCommands, Int32 commandCount)
           at System.Data.Common.DbDataAdapter.UpdatedRowStatus(RowUpdatedEventArgs rowUpdatedEvent, BatchCommandInfo[] batchCommands, Int32 commandCount)
           at System.Data.Common.DbDataAdapter.Update(DataRow[] dataRows, DataTableMapping tableMapping)
           at System.Data.Common.DbDataAdapter.UpdateFromDataTable(DataTable dataTable, DataTableMapping tableMapping)
           at System.Data.Common.DbDataAdapter.Update(DataTable dataTable)
           at TestIQueryable.Program.BatchInsert() in C:\Users\a\Downloads\TestIQueryable\TestIQueryable\TestIQueryable\Program.cs:line 124
           at TestIQueryable.Program.Main(String[] args) in C:\Users\a\Downloads\TestIQueryable\TestIQueryable\TestIQueryable\Program.cs:line 16
      InnerException: 
    

    插入批量大小为1000的500,000条记录花费的时间“2分54秒”

    当然,这不是我坐在那里停下来的官方时间(我确信有更好的方法但是懒得看他们在哪里)

    所以我发现与我的其他所有相比有点慢(期望linq到sql插入一个)并且我不确定为什么。

    接下来我查看了批量复制

    /// <summary>
    /// An ado.net 2.0 way to mass insert records. This seems to be the fastest.
    /// http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx#_Toc196622241
    /// </summary>
    private static void BatchBulkCopy()
    {
        // Get the DataTable 
        DataTable dtInsertRows = GetDataTable();
    
        using (SqlBulkCopy sbc = new SqlBulkCopy(connectionString, SqlBulkCopyOptions.KeepIdentity))
        {
            sbc.DestinationTableName = "TBL_TEST_TEST";
    
            // Number of records to be processed in one go
            sbc.BatchSize = 500000;
    
            // Map the Source Column from DataTabel to the Destination Columns in SQL Server 2005 Person Table
            // sbc.ColumnMappings.Add("ID", "ID");
            sbc.ColumnMappings.Add("NAME", "NAME");
    
            // Number of records after which client has to be notified about its status
            sbc.NotifyAfter = dtInsertRows.Rows.Count;
    
            // Event that gets fired when NotifyAfter number of records are processed.
            sbc.SqlRowsCopied += new SqlRowsCopiedEventHandler(sbc_SqlRowsCopied);
    
            // Finally write to server
            sbc.WriteToServer(dtInsertRows);
            sbc.Close();
        }
    
    }
    

    这个似乎变得非常快,甚至不需要SP(你可以使用带有批量复制的SP吗?如果可以,它会更好吗?)

    BatchCopy对于500,000批量大小没有问题。那么为什么要将它缩小到你想要发送的记录数呢?

    我发现使用BatchCopy和500,000批量大小只需 5秒即可完成。然后,我尝试批量大小为1,000,只花了 8秒

    比上面的bulkinsert快得多。

    现在我尝试了其他教程。

    USE [Test]
    GO
    /****** Object:  StoredProcedure [dbo].[spTEST_InsertXMLTEST_TEST]    Script Date: 05/19/2010 15:39:03 ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    ALTER PROCEDURE [dbo].[spTEST_InsertXMLTEST_TEST](@UpdatedProdData nText)
    AS 
     DECLARE @hDoc int   
    
     exec sp_xml_preparedocument @hDoc OUTPUT,@UpdatedProdData 
    
     INSERT INTO TBL_TEST_TEST(NAME)
     SELECT XMLProdTable.NAME
        FROM OPENXML(@hDoc, 'ArrayOfTBL_TEST_TEST/TBL_TEST_TEST', 2)   
           WITH (
                    ID Int,                 
                    NAME varchar(100)
                ) XMLProdTable
    
    EXEC sp_xml_removedocument @hDoc
    

    C#代码。

    /// <summary>
    /// This is using linq to sql to make the table objects. 
    /// It is then serailzed to to an xml document and sent to a stored proedure
    /// that then does a bulk insert(I think with OpenXML)
    ///  http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx
    /// </summary>
    private static void LinqInsertXMLBatch()
    {
        using (TestDataContext db = new TestDataContext())
        {
            TBL_TEST_TEST[] testRecords = new TBL_TEST_TEST[500000];
            for (int count = 0; count < 500000; count++)
            {
                TBL_TEST_TEST testRecord = new TBL_TEST_TEST();
                testRecord.NAME = "Name : " + count;
                testRecords[count] = testRecord;
            }
    
            StringBuilder sBuilder = new StringBuilder();
            System.IO.StringWriter sWriter = new System.IO.StringWriter(sBuilder);
            XmlSerializer serializer = new XmlSerializer(typeof(TBL_TEST_TEST[]));
            serializer.Serialize(sWriter, testRecords);
            db.insertTestData(sBuilder.ToString());
        }
    }
    

    所以我喜欢这个,因为我会使用对象,即使它有点多余。我不了解SP的工作原理。就像我没有得到整个事情。我不知道OPENXML是否在引擎盖下有一些批量插入,但我甚至不知道如何采用这个示例SP并将其更改为适合我的表格,因为就像我说我不知道​​发生了什么。

    我也不知道如果对象中有更多表格会发生什么。比如说我有一个ProductName表,它与Product表或类似的东西有关系。

    在linq to sql中,您可以获取产品名称对象并对同一对象中的Product表进行更改。所以我不确定如何考虑到这一点。我不确定我是否必须单独插入或使用什么。

    时间相当于500,000条记录 52秒

    最后一种方法当然是使用linq来完成所有操作并且非常糟糕。

    /// <summary>
    /// This is using linq to sql to to insert lots of records. 
    /// This way is slow as it uses no mass insert.
    /// Only tried to insert 50,000 records as I did not want to sit around till it did 500,000 records.
    /// http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx
    /// </summary>
    private static void LinqInsertAll()
    {
        using (TestDataContext db = new TestDataContext())
        {
            db.CommandTimeout = 600;
            for (int count = 0; count < 50000; count++)
            {
                TBL_TEST_TEST testRecord = new TBL_TEST_TEST();
                testRecord.NAME = "Name : " + count;
                db.TBL_TEST_TESTs.InsertOnSubmit(testRecord);
            }
            db.SubmitChanges();
        }
    }
    

    我只做了5万条记录,花了一分多钟才做。

    所以我真的把它缩小到linq to sql bulk insert方式或批量复制。当你和他们有任何关系时,我只是不确定怎么做。我不确定他们在更新而不是插入时都会站起来,因为我还没有尝试过。

    我认为我不需要在一种类型中插入/更新超过50,000条记录,但同时我知道在插入之前我必须对记录进行验证,这样会减慢它的速度。使linq成为更好的对象,特别是如果你在插入数据库之前首先从xml文件解析数据。

    完整的C#代码

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Xml.Serialization;
    using System.Data;
    using System.Data.SqlClient;
    
    namespace TestIQueryable
    {
        class Program
        {
            private static string connectionString = "";
            static void Main(string[] args)
            {
                BatchInsert();
                Console.WriteLine("done");
            }
    
            /// <summary>
            /// This is using linq to sql to to insert lots of records. 
            /// This way is slow as it uses no mass insert.
            /// Only tried to insert 50,000 records as I did not want to sit around till it did 500,000 records.
            /// http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx
            /// </summary>
            private static void LinqInsertAll()
            {
                using (TestDataContext db = new TestDataContext())
                {
                    db.CommandTimeout = 600;
                    for (int count = 0; count < 50000; count++)
                    {
                        TBL_TEST_TEST testRecord = new TBL_TEST_TEST();
                        testRecord.NAME = "Name : " + count;
                        db.TBL_TEST_TESTs.InsertOnSubmit(testRecord);
                    }
                    db.SubmitChanges();
                }
            }
    
            /// <summary>
            /// This is using linq to sql to make the table objects. 
            /// It is then serailzed to to an xml document and sent to a stored proedure
            /// that then does a bulk insert(I think with OpenXML)
            ///  http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx
            /// </summary>
            private static void LinqInsertXMLBatch()
            {
                using (TestDataContext db = new TestDataContext())
                {
                    TBL_TEST_TEST[] testRecords = new TBL_TEST_TEST[500000];
                    for (int count = 0; count < 500000; count++)
                    {
                        TBL_TEST_TEST testRecord = new TBL_TEST_TEST();
                        testRecord.NAME = "Name : " + count;
                        testRecords[count] = testRecord;
                    }
    
                    StringBuilder sBuilder = new StringBuilder();
                    System.IO.StringWriter sWriter = new System.IO.StringWriter(sBuilder);
                    XmlSerializer serializer = new XmlSerializer(typeof(TBL_TEST_TEST[]));
                    serializer.Serialize(sWriter, testRecords);
                    db.insertTestData(sBuilder.ToString());
                }
            }
    
            /// <summary>
            /// An ado.net 2.0 way to mass insert records. This seems to be the fastest.
            /// http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx#_Toc196622241
            /// </summary>
            private static void BatchBulkCopy()
            {
                // Get the DataTable 
                DataTable dtInsertRows = GetDataTable();
    
                using (SqlBulkCopy sbc = new SqlBulkCopy(connectionString, SqlBulkCopyOptions.KeepIdentity))
                {
                    sbc.DestinationTableName = "TBL_TEST_TEST";
    
                    // Number of records to be processed in one go
                    sbc.BatchSize = 500000;
    
                    // Map the Source Column from DataTabel to the Destination Columns in SQL Server 2005 Person Table
                    // sbc.ColumnMappings.Add("ID", "ID");
                    sbc.ColumnMappings.Add("NAME", "NAME");
    
                    // Number of records after which client has to be notified about its status
                    sbc.NotifyAfter = dtInsertRows.Rows.Count;
    
                    // Event that gets fired when NotifyAfter number of records are processed.
                    sbc.SqlRowsCopied += new SqlRowsCopiedEventHandler(sbc_SqlRowsCopied);
    
                    // Finally write to server
                    sbc.WriteToServer(dtInsertRows);
                    sbc.Close();
                }
    
            }
    
    
            /// <summary>
            /// Another ado.net 2.0 way that uses a stored procedure to do a bulk insert.
            /// Seems slower then "BatchBulkCopy" way and it crashes when you try to insert 500,000 records in one go.
            /// http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx#_Toc196622241
            /// </summary>
            private static void BatchInsert()
            {
                // Get the DataTable with Rows State as RowState.Added
                DataTable dtInsertRows = GetDataTable();
    
                SqlConnection connection = new SqlConnection(connectionString);
                SqlCommand command = new SqlCommand("sp_BatchInsert", connection);
                command.CommandType = CommandType.StoredProcedure;
                command.UpdatedRowSource = UpdateRowSource.None;
    
                // Set the Parameter with appropriate Source Column Name
                command.Parameters.Add("@Name", SqlDbType.VarChar, 50, dtInsertRows.Columns[0].ColumnName);
    
                SqlDataAdapter adpt = new SqlDataAdapter();
                adpt.InsertCommand = command;
                // Specify the number of records to be Inserted/Updated in one go. Default is 1.
                adpt.UpdateBatchSize = 500000;
    
                connection.Open();
                int recordsInserted = adpt.Update(dtInsertRows);
                connection.Close();
            }
    
    
    
            private static DataTable GetDataTable()
            {
                // You First need a DataTable and have all the insert values in it
                DataTable dtInsertRows = new DataTable();
                dtInsertRows.Columns.Add("NAME");
    
                for (int i = 0; i < 500000; i++)
                {
                    DataRow drInsertRow = dtInsertRows.NewRow();
                    string name = "Name : " + i;
                    drInsertRow["NAME"] = name;
                    dtInsertRows.Rows.Add(drInsertRow);
    
    
                }
                return dtInsertRows;
    
            }
    
    
            static void sbc_SqlRowsCopied(object sender, SqlRowsCopiedEventArgs e)
            {
                Console.WriteLine("Number of records affected : " + e.RowsCopied.ToString());
            }
    
    
        }
    }
    

3 个答案:

答案 0 :(得分:2)

批量大小可以减少网络延迟的影响。它不需要超过几千。多个语句一起收集并作为一个单元发送,因此每N个语句就会有一次网络访问,而不是每个语句一次。

答案 1 :(得分:0)

这是一次性批量复制还是常规复制?

如果它是一次性的,或者例如每天一次,使用BCP,它会更快,因为它使用比ado.net更快的特殊API。

答案 2 :(得分:0)

哦,我很高兴我们不是唯一遭受 InsertOnSubmit()问题的人。

最近,我们公司迁移了数据中心,然后,我们的 SQL Server 计算机距离我们6000英里,而不是在同一个国家/地区。

突然,保存了一批1800条记录需要3.5分钟,而不是3-4秒。我们的用户不满意!!

解决方案是使用批量插入库替换 InsertOnSubmit 调用。

阅读本页的“使用批量插入插入记录”部分。它显示了您实际需要对代码进行的(极少数)更改,以解决此延迟问题。

您只需添加三行代码,并使用此页面上提供的几个C#类。

http://mikesknowledgebase.com/pages/LINQ/InsertAndDeletes.htm