Question

我在SQL Server DB中有一个表'Objects'。它包含对象的名称（字符串）。我有一个新对象的名称列表，需要在“对象”表中插入一个单独的表'NewObjects'。此操作将在此后称为“导入”。

如果记录名已经存在于'Objects'中，我需要为每个要从'NewObjects'导入'Objects'的记录生成一个唯一的名称。这个新名称将存储在“NewObjects”表中，与旧名称相对应。

DECLARE @NewObjects TABLE
(
    ...
    Name varchar(20),
    newName nvarchar(20)
)

我已经实现了一个存储过程，它为从“NewObjects”导入的每个记录生成唯一的名称。但是，我对1000条记录的性能不满意（在'NewObjects'中）。我想要帮助来优化我的代码。以下是实施：

PROCEDURE [dbo].[importWithNewNames] @args varchar(MAX)

-- Sample of @args is like 'A,B,C,D' (a CSV string)
...


DECLARE @NewObjects TABLE
(
    _index int identity PRIMARY KEY,
    Name varchar(20),
    newName nvarchar(20)
)

-- 'SplitString' function: this is a working implementation which is right now not concern of performance
INSERT INTO @NewObjects (Name)
SELECT * from SplitString(@args, ',')

declare @beg int = 1
declare @end int
DECLARE @oldName varchar(10)

-- get the count of the rows
select @end = MAX(_index) from @NewObjects

while @beg <= @end
BEGIN
    select @oldName = Name from @NewObjects where @beg = _index

    Declare @nameExists int = 0

    -- this is our constant. We cannot change
    DECLARE @MAX_NAME_WIDTH int = 5

    DECLARE @counter int = 1
    DECLARE @newName varchar(10)
    DECLARE @z varchar(10)

    select @nameExists = count(name) from Objects where name = @oldName
    ...
    IF @nameExists > 0
    BEGIN
        -- create name based on pattern 'Fxxxxx'. Example: 'F00001', 'F00002'.
        select @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')

        while EXISTS (select top 1 1 from Objects where name = @newName)
         OR EXISTS (select top 1 1 from @NewObjects where newName = @newName)
        BEGIN
            select @counter = @counter + 1
            select @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')
        END

        select top 1 @z = @newName from Objects

        update @NewObjects
        set newName = @z where @beg = _index
    END

    select @beg = @beg + 1
END

-- finally, show the new names generated
select * from @NewObjects

Answer 1

免责声明：我无法测试这些建议，因此可能存在语法错误，您必须在实施时自行解决这些错误。它们既可以作为修复此程序的指南，也可以帮助您提高未来项目的技能。

一个优化只是浏览，当你在更大的集合上迭代时会更加普遍，这个代码在这里：

select @nameExists = count(name) from Objects where name = @oldName
...
IF @nameExists > 0

考虑将其更改为：

IF EXISTS (select name from Objects where name = @oldName)

此外，而不是这样做：

-- create name based on pattern 'Fxxxxx'. Example: 'F00001', 'F00002'.
select @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')

while EXISTS (select top 1 1 from Objects where name = @newName)
 OR EXISTS (select top 1 1 from @NewObjects where newName = @newName)
BEGIN
    select @counter = @counter + 1
    select @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')
END

考虑一下：

DECLARE @maxName VARCHAR(20)
SET @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')

SELECT @maxName = MAX(name) FROM Objects WHERE name > @newName ORDER BY name
IF (@maxName IS NOT NULL)
BEGIN
    @counter = CAST(SUBSTRING(@maxName, 2) AS INT)
    SET @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')
END

这将确保您不会迭代并执行多个查询，只是为了找到生成的名称的最大整数值。

此外，基于我所拥有的小环境，您还应该能够再进行一次优化，以确保您只需要执行上述一次永远。

DECLARE @maxName VARCHAR(20)
SET @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')

IF (@beg = 1)
BEGIN
    SELECT @maxName = MAX(name) FROM Objects WHERE name > @newName ORDER BY name
    IF (@maxName IS NOT NULL)
    BEGIN
        @counter = CAST(SUBSTRING(@maxName, 2) AS INT)
        SET @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')
    END
END

我说你可以进行优化的原因是因为除非你不得不担心其他实体在此期间插入记录看起来像你（例如Fxxxxx），那么你只有找到MAX一次，可以在循环中迭代 @counter。

事实上， 你实际上可以将整个 out 拉出来。你应该能够很容易地推断它。只需将DECLARE的{{1}}和SET与@counter内的代码一起拉出即可。 但请一步一步。

另外，请更改此行：

IF (@beg = 1)

到此：

select top 1 @z = @newName from Objects

因为字面运行查询SET @z = @newName两个局部变量。 这可能是导致性能问题的一个重要原因。除非您实际设置SET语句中的变量，否则您需要进行的良好做法，对局部变量使用SELECT操作。您的代码中还有其他一些适用的地方，请考虑以下这一行：

SET

改为使用：

select @beg = @beg + 1

最后，如上所述，只需迭代 SET @beg = @beg + 1，在您拥有此行的循环结束时：

@counter

只需添加一行：

select @beg = @beg + 1

你是金色的！

所以回顾一下，你可以收集最大的冲突名称一次，这样你就可以摆脱所有这些迭代。您将开始使用SET @counter = @counter + 1来摆脱像SET这样的性能线，您实际上是在查询表以设置两个局部变量。而且您将利用select top 1 @z = @newName from Objects方法而不是设置利用EXISTS函数AGGREGATE的变量来完成这项工作。

让我知道这些优化是如何运作的。

Answer 2

你应该避免循环内的查询..特别是如果这是在一个表变量...

您应该尝试使用临时表并在newname列上为此表编制索引。我打赌它会提高一点性能..

但是你会更好地重写它，避免那些带有查询的循环......

设置环境以进行测试...

    --this would be your object table... I feed it with some values for test
    DECLARE @Objects TABLE
    (
        _index int identity PRIMARY KEY,
        Name varchar(20)

    )
    insert into @Objects(name)
    values('A'),('A1'),('B'),('F00001')

    --the parameter of your procedure
    declare @args varchar(MAX)
    set @args = 'A,B,C,D,F00001'

    --@NewObjects2 is your @NewObjects just named the n2 cause I did run your solution together when testing

    DECLARE @NewObjects2 TABLE
    (
        _index int identity PRIMARY KEY,
        Name varchar(20),
        newName nvarchar(20)
    )

    INSERT INTO @NewObjects2 (Name)
    SELECT * from SplitString(@args, ',')

    declare @end int
    select @end = MAX(_index) from @NewObjects2
    DECLARE @MAX_NAME_WIDTH int = 5

此时它的解决方案非常相似

现在我会做什么而不是你的循环

--generate newNames in format FXXXXX with free names sufficient to give newnames for all lines in @newObject
--you should alter this to get the greater FXXXXX name inside the Objects and start generate newNames from this point.. to avoid overhead creating newNames that will sure not to be used..
with N_free as 
(
     select 
         0 as [count],
         'F' + REPLACE(STR(0, @MAX_NAME_WIDTH, 0), ' ', '0') as [newName],
         0 as fl_free,
         0 as count_free

     union all 

     select 
         N.[count] + 1 as [count],
         'F' + REPLACE(STR(N.[count]+1, @MAX_NAME_WIDTH, 0), ' ', '0') as [newName],
         OA.fl_free,
         count_free + OA.fl_free as count_free
     from 
         N_free N
     outer apply 
         (select 
              case 
                 when not exists(select name from @Objects
                                 where Name = 'F' + REPLACE(STR(N.[count]+1, @MAX_NAME_WIDTH, 0), ' ', '0')) 
                    then 1 
                 else 0 
              end as fl_free) OA
    where 
        N.count_free < @end
)
--return only those newNames that are free to be used
    ,newNames as (select  ROW_NUMBER() over (order by [count]) as _index_name
                         ,[newName] 
                  from N_free where fl_free = 1
    )
--update the @NewObjects2 giving newname for the ones that got the name already been used on Objects
    update N2
    set newName = V2.[newName]
    from @NewObjects2 N2
    inner join (select V._index,V.Name,newNames.[newName]
                from(   select row_number() over (partition by case when O.Name is not null 
                                                                        then 1
                                                                        else 0
                                                        end 
                                                        order by N._index) as _index_name
                                  ,N._index
                                  ,N.Name
                                  ,case when O.Name is not null 
                                        then 1
                                        else 0
                                    end as [fl_need_newName]
                            from @NewObjects2 N
                            left outer join @Objects O
                            on O.Name = N.Name
                    )V
                    left outer join newNames 
                    on newNames._index_name = V._index_name
                    and V.fl_need_newName = 1
    )V2
    on V2._index = N2._index
            option(MAXRECURSION 0)

    select * from @NewObjects2

我实现的结果与使用此环境解决方案相同......

您可以检查这是否真的产生相同的结果......

此查询的结果是

    _index  Name    newName
        1   A       F00002
        2   B       F00003
        3   C       NULL
        4   D       NULL
        5   F00001  F00004

生成唯一名称的性能问题

2 个答案: