我实施了TVP + SP插入策略,因为我需要插入大量行(可能是并发的),同时能够获得一些信息,如Id
和东西。最初我使用EF代码第一种方法来生成数据库结构。我的实体:
FacilityGroup
public class FacilityGroup
{
public int Id { get; set; }
[Required]
public string Name { get; set; }
public string InternalNotes { get; set; }
public virtual List<FacilityInstance> Facilities { get; set; } = new List<FacilityInstance>();
}
FacilityInstance
public class FacilityInstance
{
public int Id { get; set; }
[Required]
[Index("IX_FacilityName")]
[StringLength(450)]
public string Name { get; set; }
[Required]
public string FacilityCode { get; set; }
//[Required]
public virtual FacilityGroup FacilityGroup { get; set; }
[ForeignKey(nameof(FacilityGroup))]
[Index("IX_FacilityGroupId")]
public int FacilityGroupId { get; set; }
public virtual List<DataBatch> RelatedBatches { get; set; } = new List<DataBatch>();
public virtual HashSet<BatchRecord> BatchRecords { get; set; } = new HashSet<BatchRecord>();
}
BatchRecord
public class BatchRecord
{
public long Id { get; set; }
//todo index?
public string ItemName { get; set; }
[Index("IX_Supplier")]
[StringLength(450)]
public string Supplier { get; set; }
public decimal Quantity { get; set; }
public string ItemUnit { get; set; }
public string EntityUnit { get; set; }
public decimal ItemSize { get; set; }
public decimal PackageSize { get; set; }
[Index("IX_FamilyCode")]
[Required]
[StringLength(4)]
public string FamilyCode { get; set; }
[Required]
public string Family { get; set; }
[Index("IX_CategoryCode")]
[Required]
[StringLength(16)]
public string CategoryCode { get; set; }
[Required]
public string Category { get; set; }
[Index("IX_SubCategoryCode")]
[Required]
[StringLength(16)]
public string SubCategoryCode { get; set; }
[Required]
public string SubCategory { get; set; }
public string ItemGroupCode { get; set; }
public string ItemGroup { get; set; }
public decimal PurchaseValue { get; set; }
public decimal UnitPurchaseValue { get; set; }
public decimal PackagePurchaseValue { get; set; }
[Required]
public virtual DataBatch DataBatch { get; set; }
[ForeignKey(nameof(DataBatch))]
public int DataBatchId { get; set; }
[Required]
public virtual FacilityInstance FacilityInstance { get; set; }
[ForeignKey(nameof(FacilityInstance))]
[Index("IX_FacilityInstance")]
public int FacilityInstanceId { get; set; }
[Required]
public virtual Currency Currency { get; set; }
[ForeignKey(nameof(Currency))]
public int CurrencyId { get; set; }
}
DataBatch
public class DataBatch
{
public int Id { get; set; }
[Required]
public string Name { get; set; }
public DateTime DateCreated { get; set; }
public BatchStatus BatchStatus { get; set; }
public virtual List<FacilityInstance> RelatedFacilities { get; set; } = new List<FacilityInstance>();
public virtual HashSet<BatchRecord> BatchRecords { get; set; } = new HashSet<BatchRecord>();
}
然后我的SQL Server相关代码,TVP结构:
CREATE TYPE dbo.RecordImportStructure
AS TABLE (
ItemName VARCHAR(MAX),
Supplier VARCHAR(MAX),
Quantity DECIMAL(18, 2),
ItemUnit VARCHAR(MAX),
EntityUnit VARCHAR(MAX),
ItemSize DECIMAL(18, 2),
PackageSize DECIMAL(18, 2),
FamilyCode VARCHAR(4),
Family VARCHAR(MAX),
CategoryCode VARCHAR(MAX),
Category VARCHAR(MAX),
SubCategoryCode VARCHAR(MAX),
SubCategory VARCHAR(MAX),
ItemGroupCode VARCHAR(MAX),
ItemGroup VARCHAR(MAX),
PurchaseValue DECIMAL(18, 2),
UnitPurchaseValue DECIMAL(18, 2),
PackagePurchaseValue DECIMAL(18, 2),
FacilityCode VARCHAR(MAX),
CurrencyCode VARCHAR(MAX)
);
插入存储过程:
CREATE PROCEDURE dbo.ImportBatchRecords (
@BatchId INT,
@ImportTable dbo.RecordImportStructure READONLY
)
AS
SET NOCOUNT ON;
DECLARE @ErrorCode int
DECLARE @Step varchar(200)
--Clear old stuff?
--TRUNCATE TABLE dbo.BatchRecords;
INSERT INTO dbo.BatchRecords (
ItemName,
Supplier,
Quantity,
ItemUnit,
EntityUnit,
ItemSize,
PackageSize,
FamilyCode,
Family,
CategoryCode,
Category,
SubCategoryCode,
SubCategory,
ItemGroupCode,
ItemGroup,
PurchaseValue,
UnitPurchaseValue,
PackagePurchaseValue,
DataBatchId,
FacilityInstanceId,
CurrencyId
)
OUTPUT INSERTED.Id
SELECT
ItemName,
Supplier,
Quantity,
ItemUnit,
EntityUnit,
ItemSize,
PackageSize,
FamilyCode,
Family,
CategoryCode,
Category,
SubCategoryCode,
SubCategory,
ItemGroupCode,
ItemGroup,
PurchaseValue,
UnitPurchaseValue,
PackagePurchaseValue,
@BatchId,
--FacilityInstanceId,
--CurrencyId
(SELECT TOP 1 f.Id from dbo.FacilityInstances f WHERE f.FacilityCode=FacilityCode),
(SELECT TOP 1 c.Id from dbo.Currencies c WHERE c.CurrencyCode=CurrencyCode)
FROM @ImportTable;
最后,我的快速,仅测试解决方案在.NET端执行这些东西。
public class BatchRecordDataHandler : IBulkDataHandler<BatchRecordImportItem>
{
public async Task<int> ImportAsync(SqlConnection conn, SqlTransaction transaction, IEnumerable<BatchRecordImportItem> src)
{
using (var cmd = new SqlCommand())
{
cmd.CommandText = "ImportBatchRecords";
cmd.Connection = conn;
cmd.Transaction = transaction;
cmd.CommandType = CommandType.StoredProcedure;
cmd.CommandTimeout = 600;
var batchIdParam = new SqlParameter
{
ParameterName = "@BatchId",
SqlDbType = SqlDbType.Int,
Value = 1
};
var tableParam = new SqlParameter
{
ParameterName = "@ImportTable",
TypeName = "dbo.RecordImportStructure",
SqlDbType = SqlDbType.Structured,
Value = DataToSqlRecords(src)
};
cmd.Parameters.Add(batchIdParam);
cmd.Parameters.Add(tableParam);
cmd.Transaction = transaction;
using (var res = await cmd.ExecuteReaderAsync())
{
var resultTable = new DataTable();
resultTable.Load(res);
var cnt = resultTable.AsEnumerable().Count();
return cnt;
}
}
}
private IEnumerable<SqlDataRecord> DataToSqlRecords(IEnumerable<BatchRecordImportItem> src)
{
var tvpSchema = new[] {
new SqlMetaData("ItemName", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("Supplier", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("Quantity", SqlDbType.Decimal),
new SqlMetaData("ItemUnit", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("EntityUnit", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("ItemSize", SqlDbType.Decimal),
new SqlMetaData("PackageSize", SqlDbType.Decimal),
new SqlMetaData("FamilyCode", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("Family", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("CategoryCode", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("Category", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("SubCategoryCode", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("SubCategory", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("ItemGroupCode", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("ItemGroup", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("PurchaseValue", SqlDbType.Decimal),
new SqlMetaData("UnitPurchaseValue", SqlDbType.Decimal),
new SqlMetaData("PackagePurchaseValue", SqlDbType.Decimal),
new SqlMetaData("FacilityInstanceId", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("CurrencyId", SqlDbType.VarChar, SqlMetaData.Max),
};
var dataRecord = new SqlDataRecord(tvpSchema);
foreach (var importItem in src)
{
dataRecord.SetValues(importItem.ItemName,
importItem.Supplier,
importItem.Quantity,
importItem.ItemUnit,
importItem.EntityUnit,
importItem.ItemSize,
importItem.PackageSize,
importItem.FamilyCode,
importItem.Family,
importItem.CategoryCode,
importItem.Category,
importItem.SubCategoryCode,
importItem.SubCategory,
importItem.ItemGroupCode,
importItem.ItemGroup,
importItem.PurchaseValue,
importItem.UnitPurchaseValue,
importItem.PackagePurchaseValue,
importItem.FacilityCode,
importItem.CurrencyCode);
yield return dataRecord;
}
}
}
导入实体结构:
public class BatchRecordImportItem
{
public string ItemName { get; set; }
public string Supplier { get; set; }
public decimal Quantity { get; set; }
public string ItemUnit { get; set; }
public string EntityUnit { get; set; }
public decimal ItemSize { get; set; }
public decimal PackageSize { get; set; }
public string FamilyCode { get; set; }
public string Family { get; set; }
public string CategoryCode { get; set; }
public string Category { get; set; }
public string SubCategoryCode { get; set; }
public string SubCategory { get; set; }
public string ItemGroupCode { get; set; }
public string ItemGroup { get; set; }
public decimal PurchaseValue { get; set; }
public decimal UnitPurchaseValue { get; set; }
public decimal PackagePurchaseValue { get; set; }
public int DataBatchId { get; set; }
public string FacilityCode { get; set; }
public string CurrencyCode { get; set; }
}
最后请不要介意无用的读者,并不是真的做得太多。因此,如果没有读取器插入2.5kk行需要大约26分钟,而SqlBulkCopy
需要大约6 + - 分钟。我有什么根本上做错的吗?如果这很重要,我正在使用IsolationLevel.Snapshot
。使用SQL Server 2014,可以自由更改数据库结构和索引。
UPD 1
完成了@Xedni描述的几项调整/改进尝试,特别是:
VARCHAR(MAX)
更改为VARCHAR(*SomeValue*)
DataTable
代替IEnumerable<SqlDataRecord>
我的结构现在是这样的:
CREATE TYPE dbo.RecordImportStructure
AS TABLE (
ItemName VARCHAR(4096),
Supplier VARCHAR(450),
Quantity DECIMAL(18, 2),
ItemUnit VARCHAR(2048),
EntityUnit VARCHAR(2048),
ItemSize DECIMAL(18, 2),
PackageSize DECIMAL(18, 2),
FamilyCode VARCHAR(16),
Family VARCHAR(512),
CategoryCode VARCHAR(16),
Category VARCHAR(512),
SubCategoryCode VARCHAR(16),
SubCategory VARCHAR(512),
ItemGroupCode VARCHAR(16),
ItemGroup VARCHAR(512),
PurchaseValue DECIMAL(18, 2),
UnitPurchaseValue DECIMAL(18, 2),
PackagePurchaseValue DECIMAL(18, 2),
FacilityCode VARCHAR(450),
CurrencyCode VARCHAR(4)
);
到目前为止,没有明显的性能提升,比以前26-28分钟
UPD 3
在我的SP结束时添加了OPTION (RECOMPILE);
,获得了一个小的提升,现在坐在〜25米,为2.5kk
答案 0 :(得分:3)
您可以设置traceflag 2453:
FIX:在SQL Server 2012或SQL Server 2014中使用表变量时性能不佳
在批处理或过程中使用表变量时,将针对表变量的初始空状态编译和优化查询。如果此表变量在运行时填充了许多行,则预编译的查询计划可能不再是最佳的。例如,查询可能正在使用嵌套循环连接表变量,因为对于少量行,它通常更有效。如果表变量具有数百万行,则此查询计划可能效率低下。在这种情况下,散列连接可能是更好的选择。要获取新的查询计划,需要重新编译。但是,与其他用户或临时表不同,表变量中的行计数更改不会触发查询重新编译。通常,您可以使用OPTION(RECOMPILE)来解决这个问题,它有自己的开销。 跟踪标志2453允许在没有OPTION(RECOMPILE)的情况下重新编译查询的好处。此跟踪标志在两个主要方面与OPTION(RECOMPILE)不同。 (1)它使用与其他表相同的行计数阈值。与OPTION(RECOMPILE)不同,不需要为每次执行编译查询。仅当行计数更改超过预定义阈值时,它才会触发重新编译。 (2)OPTION(RECOMPILE)强制查询查看参数并优化查询。此跟踪标志不会强制参数查看。
您可以打开跟踪标志2453,以便在更改足够数量的行时允许表变量触发重新编译。这可能允许查询优化器选择更有效的计划
答案 1 :(得分:2)
我猜你的过程会使用一些爱。没有看到执行计划很难肯定,但这里有一些想法。
SQL Server始终假定一个表变量(表值参数本质上是),它恰好包含1行(即使它没有)。这与许多情况无关,但是在插入列表中有两个相关的子查询,这是我关注的地方。由于基数估计,它很可能通过一堆嵌套循环连接来破坏那个可怜的表变量。我会考虑将您的TVP中的行放入临时表中,使用FacilityInstances
和Currencies
中的ID更新临时表,然后从中进行最终插入。
答案 2 :(得分:2)
尝试使用以下存储过程:
CREATE PROCEDURE dbo.ImportBatchRecords (
@BatchId INT,
@ImportTable dbo.RecordImportStructure READONLY
)
AS
SET NOCOUNT ON;
DECLARE @ErrorCode int
DECLARE @Step varchar(200)
CREATE TABLE #FacilityInstances
(
Id int NOT NULL,
FacilityCode varchar(512) NOT NULL UNIQUE WITH (IGNORE_DUP_KEY=ON)
);
CREATE TABLE #Currencies
(
Id int NOT NULL,
CurrencyCode varchar(512) NOT NULL UNIQUE WITH (IGNORE_DUP_KEY = ON)
)
INSERT INTO #FacilityInstances(Id, FacilityCode)
SELECT Id, FacilityCode FROM dbo.FacilityInstances
WHERE FacilityCode IS NOT NULL AND Id IS NOT NULL;
INSERT INTO #Currencies(Id, CurrencyCode)
SELECT Id, CurrencyCode FROM dbo.Currencies
WHERE CurrencyCode IS NOT NULL AND Id IS NOT NULL
INSERT INTO dbo.BatchRecords (
ItemName,
Supplier,
Quantity,
ItemUnit,
EntityUnit,
ItemSize,
PackageSize,
FamilyCode,
Family,
CategoryCode,
Category,
SubCategoryCode,
SubCategory,
ItemGroupCode,
ItemGroup,
PurchaseValue,
UnitPurchaseValue,
PackagePurchaseValue,
DataBatchId,
FacilityInstanceId,
CurrencyId
)
OUTPUT INSERTED.Id
SELECT
ItemName,
Supplier,
Quantity,
ItemUnit,
EntityUnit,
ItemSize,
PackageSize,
FamilyCode,
Family,
CategoryCode,
Category,
SubCategoryCode,
SubCategory,
ItemGroupCode,
ItemGroup,
PurchaseValue,
UnitPurchaseValue,
PackagePurchaseValue,
@BatchId,
F.Id,
C.Id
FROM
#FacilityInstances F RIGHT OUTER HASH JOIN
(
#Currencies C
RIGHT OUTER HASH JOIN @ImportTable IT
ON C.CurrencyCode = IT.CurrencyCode
)
ON F.FacilityCode = IT.FacilityCode
这会强制执行计划使用散列匹配连接而不是嵌套循环。我认为性能不佳的罪魁祸首是第一个嵌套循环,它为@ImportTable
我不知道CurrencyCode
表中Currencies
是否唯一,因此我创建了具有唯一货币代码的时态表#Currencies。
我不知道FacilityCode
表中Facilities
是否唯一,因此我创建了具有唯一设施代码的时态表#FacilityInstances。
如果它们是唯一的,你不需要时态表,你可以直接使用永久表。
假设CurrencyCode和FacilityCode是唯一的,下面的存储过程会更好,因为它不会创建不必要的临时表:
CREATE PROCEDURE dbo.ImportBatchRecords (
@BatchId INT,
@ImportTable dbo.RecordImportStructure READONLY
)
AS
SET NOCOUNT ON;
DECLARE @ErrorCode int
DECLARE @Step varchar(200)
INSERT INTO dbo.BatchRecords (
ItemName,
Supplier,
Quantity,
ItemUnit,
EntityUnit,
ItemSize,
PackageSize,
FamilyCode,
Family,
CategoryCode,
Category,
SubCategoryCode,
SubCategory,
ItemGroupCode,
ItemGroup,
PurchaseValue,
UnitPurchaseValue,
PackagePurchaseValue,
DataBatchId,
FacilityInstanceId,
CurrencyId
)
OUTPUT INSERTED.Id
SELECT
ItemName,
Supplier,
Quantity,
ItemUnit,
EntityUnit,
ItemSize,
PackageSize,
FamilyCode,
Family,
CategoryCode,
Category,
SubCategoryCode,
SubCategory,
ItemGroupCode,
ItemGroup,
PurchaseValue,
UnitPurchaseValue,
PackagePurchaseValue,
@BatchId,
F.Id,
C.Id
FROM
dbo.FacilityInstances F RIGHT OUTER HASH JOIN
(
dbo.Currencies C
RIGHT OUTER HASH JOIN @ImportTable IT
ON C.CurrencyCode = IT.CurrencyCode
)
ON F.FacilityCode = IT.FacilityCode
答案 3 :(得分:1)
Well... why not just use SQL Bulk Copy? There's plenty of solutions out there that help you convert a collection of entities into a IDataReader object that can be handed directly to SqlBulkCopy.
This is a good start...
Then it becomes as simple as...
SqlBulkCopy bulkCopy = new SqlBulkCopy(connection);
IDataReader dataReader = storeEntities.AsDataReader();
bulkCopy.WriteToServer(dataReader);
I've used this code, the one caveat is that you need to be quite careful about the definition of your entity. The order of the properties in the entity determines the order of the columns exposed by the IDataReader and this needs to correlate with the order of the columns in the table that you are bulk copying to.
Alternatively there's other code here..
https://www.codeproject.com/Tips/1114089/Entity-Framework-Performance-Tuning-Using-SqlBulkC
答案 4 :(得分:0)
我知道有一个公认的答案,但我无法抗拒。我相信你可以比接受的答案提高20-50%的表现。
关键是直接SqlBulkCopy
到决赛桌dbo.BatchRecords
。
要实现这一目标,您需要FacilityInstanceId
和CurrencyId
才能SqlBulkCopy
。要获取它们,请将SELECT Id, FacilityCode FROM FacilityIntances
和SELECT Id, CurrencyCode FROM Currencies
加载到集合中,然后构建字典:
var facilityIdByFacilityCode = facilitiesCollection.ToDictionary(x => x.FacilityCode, x => x.Id);
var currencyIdByCurrencyCode = currenciesCollection.ToDictionnary(x => x.CurrencyCode, x => x.Id);
一旦你有了词典,从代码中获取id是恒定的时间成本。这与SQL Server中的HASH MATCH JOIN
等效且非常相似,但在客户端。
您需要拆除的另一个障碍是在Id
表格中获取新插入行的dbo.BatchRecords
列。实际上你可以在插入之前得到Id
s。
制作Id
列&#34;序列驱动&#34;:
CREATE SEQUENCE BatchRecords_Id_Seq START WITH 1;
CREATE TABLE BatchRecords
(
Id int NOT NULL CONSTRAINT DF_BatchRecords_Id DEFAULT (NEXT VALUE FOR BatchRecords_Id_Seq),
.....
CONSTRAINT PK_BatchRecords PRIMARY KEY (Id)
)
您拥有BatchRecords
集合,您知道其中有多少条记录。然后,您可以保留连续的序列范围。执行以下T-SQL:
DECLARE @BatchCollectionCount int = 2500 -- Replace with the actual value
DECLARE @range_first_value sql_variant
DECLARE @range_last_value sql_variant
EXEC sp_sequence_get_range
@sequence_name = N'BatchRecords_Id_Seq',
@range_size = @BatchCollectionCount,
@range_first_value = @range_first_value OUTPUT,
@range_last_value = @range_last_value OUTPUT
SELECT
CAST(@range_first_value AS INT) AS range_first_value,
CAST(@range_last_value AS int) as range_last_value
这会返回range_first_value
和range_last_value
。您现在可以为每条记录分配BatchRecord.Id
:
int id = range_first_value;
foreach (var record in batchRecords)
{
record.Id = id++;
}
接下来,您可以将批处理记录集SqlBulkCopy
直接放入最终表dbo.BatchRecords
。
要从DataReader
获取IEnumerable<T>
来提供SqlBulkCopy.WriteToServer
,您可以使用this之类的代码,这是EntityLite
的一部分,我开发的微型ORM
如果您缓存facilityIdByFacilityCode
和currencyIdByCurrencyCode
,则可以加快速度。为确保这些词典是最新的,您可以使用SqlDependency
或this one等技术。