SQL:有效地将增量数附加到字符串,避免重复

时间:2018-03-12 10:49:52

标签: sql sql-server performance

我有一组记录(表[#tmp_origin]),在字符串字段中包含重复的条目([Names])。我想将[#tmp_origin]的全部内容插入到目标表[#tmp_destination]中,该表不允许重复,并且可能已包含项目。

如果目标表中不存在源表中的字符串,则只需将in插入到目标表中即可。 如果目标表中的条目已存在且原始表中的条目值相同,则在将字符串插入目标表之前,必须将字符串附加的增量编号附加到该字符串。

在此示例脚本中,使用游标实现了以这种方式移动数据的过程:


-- create initial situation (origin and destination table, both containing items) - Begin

    CREATE TABLE [#tmp_origin] ([Names] VARCHAR(10))
    CREATE TABLE [#tmp_destination] ([Names] VARCHAR(10))
    CREATE UNIQUE INDEX [IX_UniqueName] ON [#tmp_destination]([Names] ASC)



    INSERT INTO [#tmp_origin]([Names]) VALUES ('a')
    INSERT INTO [#tmp_origin]([Names]) VALUES ('a')
    INSERT INTO [#tmp_origin]([Names]) VALUES ('b')
    INSERT INTO [#tmp_origin]([Names]) VALUES ('c')


    INSERT INTO [#tmp_destination]([Names]) VALUES ('a')
    INSERT INTO [#tmp_destination]([Names]) VALUES ('a_1')
    INSERT INTO [#tmp_destination]([Names]) VALUES ('b')

-- create initial situation - End

    DECLARE @Name VARCHAR(10)

    DECLARE NamesCursor CURSOR LOCAL FORWARD_ONLY FAST_FORWARD READ_ONLY FOR
        SELECT [Names]
        FROM [#tmp_origin];
    OPEN NamesCursor;
    FETCH NEXT FROM NamesCursor INTO @Name;

    WHILE @@FETCH_STATUS = 0
    BEGIN
        DECLARE @finalName VARCHAR(10)
        SET @finalName = @Name
        DECLARE @counter INT
        SET @counter = 1

        WHILE(1=1)
        BEGIN
            IF NOT EXISTS(SELECT * FROM [#tmp_destination] WHERE [Names] = @finalName)
                BREAK;

            SET @finalName = @Name + '_' + CAST(@counter AS VARCHAR)
            SET @counter = @counter + 1
        END
        INSERT INTO [#tmp_destination] ([Names]) (
            SELECT @finalName
        )

        FETCH NEXT FROM NamesCursor INTO @Name;
    END

    CLOSE NamesCursor;
    DEALLOCATE NamesCursor;




    SELECT *
    FROM [#tmp_destination]

    /*
    Expected result:
    a
    a_1
    a_2
    a_3
    b
    b_1
    c
    */

    DROP TABLE [#tmp_origin]
    DROP TABLE [#tmp_destination]


这可以正常工作,但是当要插入的项目数量增加时,其性能会大幅下降。

有什么想加快速度吗?

感谢

3 个答案:

答案 0 :(得分:5)

使用窗口功能可以对重复项进行编号。您还可以从目的地表中获取计数(需要条件来剥去您添加的后缀):

select orig.names,
       row_number() over (partition by orig.names order by orig.names) as rowNo,
       dest.count
from ##tmp_origin orig
  cross apply (select count(1) from #tmp_destination where names = orig.names) as dest

可以从上面构建insert(如果大于零,则新后缀为rowNo + dest.count -1。)

建议您重构目标临时表,将名称和后缀包含在单独的列中 - 这可能意味着有一个新的中间阶段 - 因为这将使匹配逻辑更加简单。

答案 1 :(得分:1)

这样的事情:

insert  [#tmp_destination]
select  CASE WHEN row_number() over(partition by Names order by Names) > 1 THEN Names + '_' + CONVERT(VARCHAR(10), row_number() over(partition by Names order by Names)) ELSE Names END
from    [#tmp_origin]

答案 2 :(得分:1)

在这种情况下,我不会使用游标。相反,我将使用ROW_NUMBER()构建查询。这样您就可以在原始表中添加一个计数器,然后使用此计数器附加到[Names]:

SELECT [Names], ROW_NUMBER() OVER (PARTITION BY [Names] ORDER BY [Names]) - 1 AS [counter]
INTO #tmp_origin_with_counter
FROM #tmp_origin

SELECT CONCAT([Names], IIF([counter] = 0, '', '_'+ CAST([counter] AS NVARCHAR)))
INTO #tmp_destination
FROM #tmp_origin_with_counter