Question

我实施了TVP + SP插入策略，因为我需要插入大量行（可能是并发的），同时能够获得一些信息，如Id和东西。最初我使用EF代码第一种方法来生成数据库结构。我的实体：

FacilityGroup

public class FacilityGroup
{
    public int Id { get; set; }

    [Required]
    public string Name { get; set; }

    public string InternalNotes { get; set; }

    public virtual List<FacilityInstance> Facilities { get; set; } = new List<FacilityInstance>();
}

FacilityInstance

public class FacilityInstance
{
    public int Id { get; set; }

    [Required]
    [Index("IX_FacilityName")]
    [StringLength(450)]
    public string Name { get; set; }

    [Required]
    public string FacilityCode { get; set; }

    //[Required]
    public virtual FacilityGroup FacilityGroup { get; set; }

    [ForeignKey(nameof(FacilityGroup))]
    [Index("IX_FacilityGroupId")]
    public int FacilityGroupId { get; set; }

    public virtual List<DataBatch> RelatedBatches { get; set; } = new List<DataBatch>();

    public virtual HashSet<BatchRecord> BatchRecords { get; set; } = new HashSet<BatchRecord>();
}

BatchRecord

public class BatchRecord
{
    public long Id { get; set; }

    //todo index?
    public string ItemName { get; set; }

    [Index("IX_Supplier")]
    [StringLength(450)]
    public string Supplier { get; set; }

    public decimal Quantity { get; set; }

    public string ItemUnit { get; set; }

    public string EntityUnit { get; set; }

    public decimal ItemSize { get; set; }

    public decimal PackageSize { get; set; }

    [Index("IX_FamilyCode")]
    [Required]
    [StringLength(4)]
    public string FamilyCode { get; set; }

    [Required]
    public string Family { get; set; }

    [Index("IX_CategoryCode")]
    [Required]
    [StringLength(16)]
    public string CategoryCode { get; set; }

    [Required]
    public string Category { get; set; }

    [Index("IX_SubCategoryCode")]
    [Required]
    [StringLength(16)]
    public string SubCategoryCode { get; set; }

    [Required]
    public string SubCategory { get; set; }

    public string ItemGroupCode { get; set; }

    public string ItemGroup { get; set; }

    public decimal PurchaseValue { get; set; }

    public decimal UnitPurchaseValue { get; set; }

    public decimal PackagePurchaseValue { get; set; }

    [Required]
    public virtual DataBatch DataBatch { get; set; }

    [ForeignKey(nameof(DataBatch))]
    public int DataBatchId { get; set; }

    [Required]
    public virtual FacilityInstance FacilityInstance { get; set; }

    [ForeignKey(nameof(FacilityInstance))]
    [Index("IX_FacilityInstance")]
    public int FacilityInstanceId { get; set; }

    [Required]
    public virtual Currency Currency { get; set; }

    [ForeignKey(nameof(Currency))]
    public int CurrencyId { get; set; }
}

DataBatch

public class DataBatch
{
    public int Id { get; set; }

    [Required]
    public string Name { get; set; }

    public DateTime DateCreated { get; set; }

    public BatchStatus BatchStatus { get; set; }

    public virtual List<FacilityInstance> RelatedFacilities { get; set; } = new List<FacilityInstance>();

    public virtual HashSet<BatchRecord> BatchRecords { get; set; } = new HashSet<BatchRecord>();
}

然后我的SQL Server相关代码，TVP结构：

CREATE TYPE dbo.RecordImportStructure 
AS TABLE (
ItemName VARCHAR(MAX),
Supplier VARCHAR(MAX),
Quantity DECIMAL(18, 2),
ItemUnit VARCHAR(MAX),
EntityUnit VARCHAR(MAX),
ItemSize DECIMAL(18, 2),
PackageSize DECIMAL(18, 2),
FamilyCode VARCHAR(4),
Family VARCHAR(MAX),
CategoryCode VARCHAR(MAX),
Category VARCHAR(MAX),
SubCategoryCode VARCHAR(MAX),
SubCategory VARCHAR(MAX),
ItemGroupCode VARCHAR(MAX),
ItemGroup VARCHAR(MAX),
PurchaseValue DECIMAL(18, 2),
UnitPurchaseValue DECIMAL(18, 2),
PackagePurchaseValue DECIMAL(18, 2),
FacilityCode VARCHAR(MAX),
CurrencyCode VARCHAR(MAX)
);

插入存储过程：

CREATE PROCEDURE dbo.ImportBatchRecords (
    @BatchId INT,
    @ImportTable dbo.RecordImportStructure READONLY
)
AS
SET NOCOUNT ON;

DECLARE     @ErrorCode  int  
DECLARE     @Step  varchar(200)

--Clear old stuff?
--TRUNCATE TABLE dbo.BatchRecords; 

INSERT INTO dbo.BatchRecords (
    ItemName,
    Supplier,
    Quantity,
    ItemUnit,
    EntityUnit,
    ItemSize,
    PackageSize,
    FamilyCode,
    Family,
    CategoryCode,
    Category,
    SubCategoryCode,
    SubCategory,
    ItemGroupCode,
    ItemGroup,
    PurchaseValue,
    UnitPurchaseValue,
    PackagePurchaseValue,
    DataBatchId,
    FacilityInstanceId,
    CurrencyId
)
    OUTPUT INSERTED.Id
    SELECT
    ItemName,
    Supplier,
    Quantity,
    ItemUnit,
    EntityUnit,
    ItemSize,
    PackageSize,
    FamilyCode,
    Family,
    CategoryCode,
    Category,
    SubCategoryCode,
    SubCategory,
    ItemGroupCode,
    ItemGroup,
    PurchaseValue,
    UnitPurchaseValue,
    PackagePurchaseValue,
    @BatchId,
    --FacilityInstanceId,
    --CurrencyId
    (SELECT TOP 1 f.Id from dbo.FacilityInstances f WHERE f.FacilityCode=FacilityCode),
    (SELECT TOP 1 c.Id from dbo.Currencies c WHERE c.CurrencyCode=CurrencyCode) 
    FROM    @ImportTable;

最后，我的快速，仅测试解决方案在.NET端执行这些东西。

public class BatchRecordDataHandler : IBulkDataHandler<BatchRecordImportItem>
{
    public async Task<int> ImportAsync(SqlConnection conn, SqlTransaction transaction, IEnumerable<BatchRecordImportItem> src)
    {
        using (var cmd = new SqlCommand())
        {
            cmd.CommandText = "ImportBatchRecords";
            cmd.Connection = conn;
            cmd.Transaction = transaction;
            cmd.CommandType = CommandType.StoredProcedure;
            cmd.CommandTimeout = 600;

            var batchIdParam = new SqlParameter
            {
                ParameterName = "@BatchId",
                SqlDbType = SqlDbType.Int,
                Value = 1
            };

            var tableParam = new SqlParameter
            {
                ParameterName = "@ImportTable",
                TypeName = "dbo.RecordImportStructure",
                SqlDbType = SqlDbType.Structured,
                Value = DataToSqlRecords(src)
            };

            cmd.Parameters.Add(batchIdParam);
            cmd.Parameters.Add(tableParam);

            cmd.Transaction = transaction;

            using (var res = await cmd.ExecuteReaderAsync())
            {
                var resultTable = new DataTable();
                resultTable.Load(res);

                var cnt = resultTable.AsEnumerable().Count();

                return cnt;
            }
        }
    }

    private IEnumerable<SqlDataRecord> DataToSqlRecords(IEnumerable<BatchRecordImportItem> src)
    {
        var tvpSchema = new[] {
            new SqlMetaData("ItemName", SqlDbType.VarChar, SqlMetaData.Max),
            new SqlMetaData("Supplier", SqlDbType.VarChar, SqlMetaData.Max),
            new SqlMetaData("Quantity", SqlDbType.Decimal),
            new SqlMetaData("ItemUnit", SqlDbType.VarChar, SqlMetaData.Max),
            new SqlMetaData("EntityUnit", SqlDbType.VarChar, SqlMetaData.Max),
            new SqlMetaData("ItemSize", SqlDbType.Decimal),
            new SqlMetaData("PackageSize", SqlDbType.Decimal),
            new SqlMetaData("FamilyCode", SqlDbType.VarChar, SqlMetaData.Max),
            new SqlMetaData("Family", SqlDbType.VarChar, SqlMetaData.Max),
            new SqlMetaData("CategoryCode", SqlDbType.VarChar, SqlMetaData.Max),
            new SqlMetaData("Category", SqlDbType.VarChar, SqlMetaData.Max),
            new SqlMetaData("SubCategoryCode", SqlDbType.VarChar, SqlMetaData.Max),
            new SqlMetaData("SubCategory", SqlDbType.VarChar, SqlMetaData.Max),
            new SqlMetaData("ItemGroupCode", SqlDbType.VarChar, SqlMetaData.Max),
            new SqlMetaData("ItemGroup", SqlDbType.VarChar, SqlMetaData.Max),
            new SqlMetaData("PurchaseValue", SqlDbType.Decimal),
            new SqlMetaData("UnitPurchaseValue", SqlDbType.Decimal),
            new SqlMetaData("PackagePurchaseValue", SqlDbType.Decimal),
            new SqlMetaData("FacilityInstanceId", SqlDbType.VarChar, SqlMetaData.Max),
            new SqlMetaData("CurrencyId", SqlDbType.VarChar, SqlMetaData.Max),
        };

        var dataRecord = new SqlDataRecord(tvpSchema);

        foreach (var importItem in src)
        {
            dataRecord.SetValues(importItem.ItemName,
                importItem.Supplier,
                importItem.Quantity,
                importItem.ItemUnit,
                importItem.EntityUnit,
                importItem.ItemSize,
                importItem.PackageSize,
                importItem.FamilyCode,
                importItem.Family,
                importItem.CategoryCode,
                importItem.Category,
                importItem.SubCategoryCode,
                importItem.SubCategory,
                importItem.ItemGroupCode,
                importItem.ItemGroup,
                importItem.PurchaseValue,
                importItem.UnitPurchaseValue,
                importItem.PackagePurchaseValue,
                importItem.FacilityCode,
                importItem.CurrencyCode);

            yield return dataRecord;
        }
    }
}

导入实体结构：

public class BatchRecordImportItem
{
    public string ItemName { get; set; }

    public string Supplier { get; set; }

    public decimal Quantity { get; set; }

    public string ItemUnit { get; set; }

    public string EntityUnit { get; set; }

    public decimal ItemSize { get; set; }

    public decimal PackageSize { get; set; }

    public string FamilyCode { get; set; }

    public string Family { get; set; }

    public string CategoryCode { get; set; }

    public string Category { get; set; }

    public string SubCategoryCode { get; set; }

    public string SubCategory { get; set; }

    public string ItemGroupCode { get; set; }

    public string ItemGroup { get; set; }

    public decimal PurchaseValue { get; set; }

    public decimal UnitPurchaseValue { get; set; }

    public decimal PackagePurchaseValue { get; set; }

    public int DataBatchId { get; set; }

    public string FacilityCode { get; set; }

    public string CurrencyCode { get; set; }
}

最后请不要介意无用的读者，并不是真的做得太多。因此，如果没有读取器插入2.5kk行需要大约26分钟，而SqlBulkCopy需要大约6 + - 分钟。我有什么根本上做错的吗？如果这很重要，我正在使用IsolationLevel.Snapshot。使用SQL Server 2014，可以自由更改数据库结构和索引。

UPD 1

完成了@Xedni描述的几项调整/改进尝试，特别是：

限制所有没有最大长度到某个固定长度的字符串字段
将所有TVP成员从VARCHAR(MAX)更改为VARCHAR(*SomeValue*)
为FacilityInstance-＆gt; FacilityCode
为Curreency添加了一个唯一索引 - ＆gt; CurrencyCode
尝试将WITH RECOMPILE添加到我的SP
尝试使用DataTable代替IEnumerable<SqlDataRecord>
尝试将批量数据分成较小的桶，每SP执行50k和100k而不是2.5kk

我的结构现在是这样的：

CREATE TYPE dbo.RecordImportStructure 
AS TABLE (
ItemName VARCHAR(4096),
Supplier VARCHAR(450),
Quantity DECIMAL(18, 2),
ItemUnit VARCHAR(2048),
EntityUnit VARCHAR(2048),
ItemSize DECIMAL(18, 2),
PackageSize DECIMAL(18, 2),
FamilyCode VARCHAR(16),
Family VARCHAR(512),
CategoryCode VARCHAR(16),
Category VARCHAR(512),
SubCategoryCode VARCHAR(16),
SubCategory VARCHAR(512),
ItemGroupCode VARCHAR(16),
ItemGroup VARCHAR(512),
PurchaseValue DECIMAL(18, 2),
UnitPurchaseValue DECIMAL(18, 2),
PackagePurchaseValue DECIMAL(18, 2),
FacilityCode VARCHAR(450),
CurrencyCode VARCHAR(4)
);

到目前为止，没有明显的性能提升，比以前26-28分钟

UPD 2
检查执行计划 - 指数是我的祸根？

UPD 3
在我的SP结束时添加了OPTION (RECOMPILE);，获得了一个小的提升，现在坐在〜25米，为2.5kk

Answer 1

您可以设置traceflag 2453：

FIX：在SQL Server 2012或SQL Server 2014中使用表变量时性能不佳

在批处理或过程中使用表变量时，将针对表变量的初始空状态编译和优化查询。如果此表变量在运行时填充了许多行，则预编译的查询计划可能不再是最佳的。例如，查询可能正在使用嵌套循环连接表变量，因为对于少量行，它通常更有效。如果表变量具有数百万行，则此查询计划可能效率低下。在这种情况下，散列连接可能是更好的选择。要获取新的查询计划，需要重新编译。但是，与其他用户或临时表不同，表变量中的行计数更改不会触发查询重新编译。通常，您可以使用OPTION（RECOMPILE）来解决这个问题，它有自己的开销。   跟踪标志2453允许在没有OPTION（RECOMPILE）的情况下重新编译查询的好处。此跟踪标志在两个主要方面与OPTION（RECOMPILE）不同。   （1）它使用与其他表相同的行计数阈值。与OPTION（RECOMPILE）不同，不需要为每次执行编译查询。仅当行计数更改超过预定义阈值时，它才会触发重新编译。   （2）OPTION（RECOMPILE）强制查询查看参数并优化查询。此跟踪标志不会强制参数查看。

您可以打开跟踪标志2453，以便在更改足够数量的行时允许表变量触发重新编译。这可能允许查询优化器选择更有效的计划

Answer 2

我猜你的过程会使用一些爱。没有看到执行计划很难肯定，但这里有一些想法。

SQL Server始终假定一个表变量（表值参数本质上是），它恰好包含1行（即使它没有）。这与许多情况无关，但是在插入列表中有两个相关的子查询，这是我关注的地方。由于基数估计，它很可能通过一堆嵌套循环连接来破坏那个可怜的表变量。我会考虑将您的TVP中的行放入临时表中，使用FacilityInstances和Currencies中的ID更新临时表，然后从中进行最终插入。

Answer 3

尝试使用以下存储过程：

CREATE PROCEDURE dbo.ImportBatchRecords (
    @BatchId INT,
    @ImportTable dbo.RecordImportStructure READONLY
)
AS
    SET NOCOUNT ON;

    DECLARE     @ErrorCode  int  
    DECLARE     @Step  varchar(200)


    CREATE TABLE #FacilityInstances
    (
        Id int NOT NULL,
        FacilityCode varchar(512) NOT NULL UNIQUE WITH (IGNORE_DUP_KEY=ON)
    );

    CREATE TABLE #Currencies
    (
        Id int NOT NULL,
        CurrencyCode varchar(512) NOT NULL UNIQUE WITH (IGNORE_DUP_KEY = ON)
    )

    INSERT INTO #FacilityInstances(Id, FacilityCode)
    SELECT Id, FacilityCode FROM dbo.FacilityInstances
    WHERE FacilityCode IS NOT NULL AND Id IS NOT NULL;

    INSERT INTO #Currencies(Id, CurrencyCode)
    SELECT Id, CurrencyCode FROM dbo.Currencies
    WHERE CurrencyCode IS NOT NULL AND Id IS NOT NULL


    INSERT INTO dbo.BatchRecords (
        ItemName,
        Supplier,
        Quantity,
        ItemUnit,
        EntityUnit,
        ItemSize,
        PackageSize,
        FamilyCode,
        Family,
        CategoryCode,
        Category,
        SubCategoryCode,
        SubCategory,
        ItemGroupCode,
        ItemGroup,
        PurchaseValue,
        UnitPurchaseValue,
        PackagePurchaseValue,
        DataBatchId,
        FacilityInstanceId,
        CurrencyId
    )
    OUTPUT INSERTED.Id
    SELECT
        ItemName,
        Supplier,
        Quantity,
        ItemUnit,
        EntityUnit,
        ItemSize,
        PackageSize,
        FamilyCode,
        Family,
        CategoryCode,
        Category,
        SubCategoryCode,
        SubCategory,
        ItemGroupCode,
        ItemGroup,
        PurchaseValue,
        UnitPurchaseValue,
        PackagePurchaseValue,
        @BatchId,
        F.Id,
        C.Id
    FROM   
        #FacilityInstances F RIGHT OUTER HASH JOIN 
        (
            #Currencies C 
            RIGHT OUTER HASH JOIN @ImportTable IT 
                ON C.CurrencyCode = IT.CurrencyCode
        )
        ON F.FacilityCode = IT.FacilityCode

这会强制执行计划使用散列匹配连接而不是嵌套循环。我认为性能不佳的罪魁祸首是第一个嵌套循环，它为@ImportTable

中的每一行执行索引扫描

我不知道CurrencyCode表中Currencies是否唯一，因此我创建了具有唯一货币代码的时态表#Currencies。

我不知道FacilityCode表中Facilities是否唯一，因此我创建了具有唯一设施代码的时态表#FacilityInstances。

如果它们是唯一的，你不需要时态表，你可以直接使用永久表。

假设CurrencyCode和FacilityCode是唯一的，下面的存储过程会更好，因为它不会创建不必要的临时表：

CREATE PROCEDURE dbo.ImportBatchRecords (
    @BatchId INT,
    @ImportTable dbo.RecordImportStructure READONLY
)
AS
    SET NOCOUNT ON;

    DECLARE     @ErrorCode  int  
    DECLARE     @Step  varchar(200)



    INSERT INTO dbo.BatchRecords (
        ItemName,
        Supplier,
        Quantity,
        ItemUnit,
        EntityUnit,
        ItemSize,
        PackageSize,
        FamilyCode,
        Family,
        CategoryCode,
        Category,
        SubCategoryCode,
        SubCategory,
        ItemGroupCode,
        ItemGroup,
        PurchaseValue,
        UnitPurchaseValue,
        PackagePurchaseValue,
        DataBatchId,
        FacilityInstanceId,
        CurrencyId
    )
    OUTPUT INSERTED.Id
    SELECT
        ItemName,
        Supplier,
        Quantity,
        ItemUnit,
        EntityUnit,
        ItemSize,
        PackageSize,
        FamilyCode,
        Family,
        CategoryCode,
        Category,
        SubCategoryCode,
        SubCategory,
        ItemGroupCode,
        ItemGroup,
        PurchaseValue,
        UnitPurchaseValue,
        PackagePurchaseValue,
        @BatchId,
        F.Id,
        C.Id
    FROM   
        dbo.FacilityInstances F RIGHT OUTER HASH JOIN 
        (
            dbo.Currencies C 
            RIGHT OUTER HASH JOIN @ImportTable IT 
                ON C.CurrencyCode = IT.CurrencyCode
        )
        ON F.FacilityCode = IT.FacilityCode

Answer 4

Well... why not just use SQL Bulk Copy? There's plenty of solutions out there that help you convert a collection of entities into a IDataReader object that can be handed directly to SqlBulkCopy.

This is a good start...

https://github.com/matthewschrager/Repository/blob/master/Repository.EntityFramework/EntityDataReader.cs

Then it becomes as simple as...

SqlBulkCopy bulkCopy = new SqlBulkCopy(connection);
IDataReader dataReader = storeEntities.AsDataReader();
bulkCopy.WriteToServer(dataReader);

I've used this code, the one caveat is that you need to be quite careful about the definition of your entity. The order of the properties in the entity determines the order of the columns exposed by the IDataReader and this needs to correlate with the order of the columns in the table that you are bulk copying to.

Alternatively there's other code here..

https://www.codeproject.com/Tips/1114089/Entity-Framework-Performance-Tuning-Using-SqlBulkC

Answer 5

我知道有一个公认的答案，但我无法抗拒。我相信你可以比接受的答案提高20-50％的表现。

关键是直接SqlBulkCopy到决赛桌dbo.BatchRecords。

要实现这一目标，您需要FacilityInstanceId和CurrencyId才能SqlBulkCopy。要获取它们，请将SELECT Id, FacilityCode FROM FacilityIntances和SELECT Id, CurrencyCode FROM Currencies加载到集合中，然后构建字典：

var facilityIdByFacilityCode = facilitiesCollection.ToDictionary(x => x.FacilityCode, x => x.Id);
var currencyIdByCurrencyCode = currenciesCollection.ToDictionnary(x => x.CurrencyCode, x => x.Id);

一旦你有了词典，从代码中获取id是恒定的时间成本。这与SQL Server中的HASH MATCH JOIN等效且非常相似，但在客户端。

您需要拆除的另一个障碍是在Id表格中获取新插入行的dbo.BatchRecords列。实际上你可以在插入之前得到Id s。

制作Id列＆＃34;序列驱动＆＃34;：

CREATE SEQUENCE BatchRecords_Id_Seq START WITH 1;
CREATE TABLE BatchRecords
(
   Id int NOT NULL CONSTRAINT DF_BatchRecords_Id DEFAULT (NEXT VALUE FOR BatchRecords_Id_Seq), 

 .....

   CONSTRAINT PK_BatchRecords PRIMARY KEY (Id)

)

您拥有BatchRecords集合，您知道其中有多少条记录。然后，您可以保留连续的序列范围。执行以下T-SQL：

DECLARE @BatchCollectionCount int = 2500 -- Replace with the actual value
DECLARE @range_first_value sql_variant
DECLARE @range_last_value sql_variant

EXEC sp_sequence_get_range
     @sequence_name =  N'BatchRecords_Id_Seq', 
     @range_size =  @BatchCollectionCount,
     @range_first_value = @range_first_value OUTPUT, 
     @range_last_value = @range_last_value OUTPUT

SELECT 
    CAST(@range_first_value AS INT) AS range_first_value, 
    CAST(@range_last_value AS int) as range_last_value

这会返回range_first_value和range_last_value。您现在可以为每条记录分配BatchRecord.Id：

int id = range_first_value;
foreach (var record in batchRecords)
{
   record.Id = id++;
}

接下来，您可以将批处理记录集SqlBulkCopy直接放入最终表dbo.BatchRecords。

要从DataReader获取IEnumerable<T>来提供SqlBulkCopy.WriteToServer，您可以使用this之类的代码，这是EntityLite的一部分，我开发的微型ORM

如果您缓存facilityIdByFacilityCode和currencyIdByCurrencyCode，则可以加快速度。为确保这些词典是最新的，您可以使用SqlDependency或this one等技术。

表值参数插入表现不佳

5 个答案: