如何有效地规范导入数据?

时间:2014-02-24 10:04:45

标签: sql-server sql-server-2012

我有一个包含非规范化数据的平面文件。有充分理由,我无法改变这一点。我需要将它放入规范化的相关表中,以便在LightSwitch中使用。数据不包含任何原始表的标识列值。我有四列:

Division    Branch    Position    Location

规范化数据的模式是:分部包含分支。分支包含职位。位置和位置通过PositionLocationMappings表具有多对多关系。

我执行BULK INSERT将非规范化数据放入表中。然后,我逐行处理它,为每一行调用一个存储过程。源文件中有大约16,000行,需要27秒,这看起来有点慢。有没有办法更快地完成它?

这是在我的PostDeployment.sql脚本中:

DECLARE @division nvarchar(240)
DECLARE @branch nvarchar(240)
DECLARE @position nvarchar(240)
DECLARE @location nvarchar(60)

DECLARE myCursor CURSOR LOCAL FOR
    SELECT DISTINCT Division,Branch,Position,Location
        FROM [staging].BranchPositions

OPEN myCursor

FETCH NEXT FROM myCursor INTO @division, @branch, @position, @location

WHILE @@FETCH_STATUS = 0 BEGIN
    EXECUTE [dbo].[usp_InsertBranchPositions] @division,@branch,@position,@location

    FETCH NEXT FROM myCursor INTO @division, @branch, @position, @location
END

CLOSE myCursor
DEALLOCATE myCursor

这是存储过程:

ALTER PROCEDURE [dbo].[usp_InsertBranchPositions] 
    @division nvarchar(240),
    @branch   nvarchar(240),
    @position nvarchar(240),
    @location nvarchar(60)
AS
BEGIN
    SET NOCOUNT ON;

    BEGIN TRANSACTION

    DECLARE @divisionTable TABLE (InsertedDivisionId int)
    DECLARE @branchTable   TABLE (InsertedBranchId   int)
    DECLARE @positionTable TABLE (InsertedPositionId int)
    DECLARE @locationTable TABLE (InsertedLocationid int)

    DECLARE @divisionId int
    DECLARE @branchId   int
    DECLARE @positionId int
    DECLARE @locationId int

    SELECT @divisionId = [Id] FROM [dbo].[Divisions]
     WHERE DivisionName = @division

    IF @divisionId IS NULL
    BEGIN
        INSERT INTO [dbo].[Divisions] (DivisionName, IsDivisionActive)
             VALUES (@division, 1)
        SELECT @divisionId = SCOPE_IDENTITY()
    END

    SELECT @branchId = [Id] FROM [dbo].[Branches]
     WHERE BranchName = @branch

    IF @branchId IS NULL
    BEGIN
       INSERT INTO [dbo].[Branches] (BranchName, IsBranchActive, DivisionId)
             VALUES (@branch, 1, @divisionId)
        SELECT @branchId = SCOPE_IDENTITY()
    END

    SELECT @positionId = [Id] FROM [dbo].[Positions]
     WHERE PositionName = @position

    IF @positionId IS NULL
    BEGIN
        INSERT INTO [dbo].[Positions] (PositionName, IsPositionActive, BranchId)
             VALUES (@position, 1, @branchId)
        SELECT @positionId = SCOPE_IDENTITY()
    END

    SELECT @locationId = [Id] FROM [dbo].[Locations]
     WHERE LocationName = @location

    IF @locationId IS NULL
    BEGIN
        INSERT INTO [dbo].[Locations] (LocationName, IsLocationActive)
             VALUES (@location, 1)
        SELECT @locationId = SCOPE_IDENTITY()
    END

    INSERT INTO [dbo].[PositionLocationMappings] (PositionId, LocationId)
         VALUES (@positionId, @locationId)

    COMMIT TRANSACTION
END

1 个答案:

答案 0 :(得分:2)

您可以使用基于集合的操作导入数据,而不是为每一行调用过程。

例如,您可以更改此代码段完成的工作:

IF @divisionId IS NULL
BEGIN
    INSERT INTO [dbo].[Divisions] (DivisionName, IsDivisionActive)
         VALUES (@division, 1)
    SELECT @divisionId = SCOPE_IDENTITY()
END

要:

insert  Divisions 
        (DivisionName, IsDivisionActive)
select  distinct DivisionName
,       1
from    BranchPositions

然后对于Branches,您可以使用join查找DivisionId

insert  Branches
        (BranchName, IsBranchActive, DivisionId)
select  distinct BranchName
,       1
,       d.Id
from    BranchPositions bp
join    Divisions d
on      bp.DivisionName = d.DivisionName

等等。这应该快得多,我用它在一分钟内导入了数十亿行。