TSQL标记具有唯一唯一ID的重复记录

时间:2013-08-02 06:22:51

标签: sql-server tsql sql-update common-table-expression duplicate-data

必须有更好的方法来做到这一点。

我有一个表格,我标记了所有这些独特的记录:

    WITH
    CTE( TransId, OriginalName, StrippedName, RowNumber )
    AS
    (
        SELECT  TransID ,
                Name ,
                StrippedName,
                RN = ROW_NUMBER() OVER ( PARTITION BY StrippedName ORDER BY StrippedName )
        FROM    dbo.Members
    )

UPDATE  dbo.Members
SET ParenId = TransID
WHERE   TransID IN ( SELECT TransId FROM CTE WHERE RowNumber = 1 )

现在我想将所有重复记录(其中RowNumber> 1)更新为唯一的ParentId。现在我正在使用UDF,它可以工作,但需要很长时间。这是UDF:

CREATE FUNCTION dbo.getParentTransId ( @TransId INT, @strippedBusName VARCHAR(200) )
RETURNS INT
AS
BEGIN
    DECLARE @ParentTransId INT

    SELECT  @ParentTransId = TransId
    FROM    dbo.Members
    WHERE   StrippedBusName = @strippedBusName
      AND   ParenId IS NOT NULL

    IF  @ParentTransId IS NULL
    BEGIN
        SET @ParentTransId = @TransId
    END

    RETURN @ParentTransId
END
GO

这是我目前正在使用的更新声明:

UPDATE  dt
SET dt.ParenId = dbo.getParentTransId ( dt.TransID, dt.StrippedBusName )
FROM    dbo.Members dt
GO

有没有办法在不使用UDF的情况下进行相同的更新?

2 个答案:

答案 0 :(得分:1)

这是一个显示@gotqn答案的测试脚本:

-- create table
CREATE TABLE DupUpdateTest
(
    Id      INT NOT NULL PRIMARY KEY IDENTITY( 1, 1 ),
    TransId INT NOT NULL,
    OriginalName    VARCHAR(MAX),
    StrippedName    VARCHAR(MAX),
    ParentId    INT NULL
)
GO

-- insert data
INSERT  INTO dbo.DupUpdateTest VALUES (2, 'name 1', 'name1', NULL)
INSERT  INTO dbo.DupUpdateTest VALUES (3, 'name 1', 'name1', NULL)
INSERT  INTO dbo.DupUpdateTest VALUES (6, 'name 1', 'name1', NULL)
INSERT  INTO dbo.DupUpdateTest VALUES (8, 'name 2', 'name2', NULL)
INSERT  INTO dbo.DupUpdateTest VALUES (11, 'name 2', 'name2', NULL)
INSERT  INTO dbo.DupUpdateTest VALUES (15, 'name 55', 'name55', NULL)
INSERT  INTO dbo.DupUpdateTest VALUES (23, 'name 55', 'name55', NULL)
INSERT  INTO dbo.DupUpdateTest VALUES (99, 'name 100', 'name100', NULL)
INSERT  INTO dbo.DupUpdateTest VALUES (122, 'name 108', 'name108', NULL)

-- update parent id match
UPDATE dt
SET dt.ParentId = COALESCE(dt1.TransId, dt.TransId ) -- Returns the first nonnull expression among its arguments
FROM  dbo.DupUpdateTest dt
LEFT JOIN dbo.DupUpdateTest dt1
    ON dt.StrippedName   = dt1.StrippedName

-- view results
SELECT * FROM dbo.DupUpdateTest

答案 1 :(得分:0)

我想这应该有效:

UPDATE  dbo.Members dt
SET dt.ParenId = COALESCE( (SELECT TrandsID FROM dbo.Members WHERE  dt.StrippedBusName   = StrippedBusName  AND   ParenId IS NOT NULL), dt.TransId )

我不确定,但以下方式应该更快:

UPDATE  dbo.Members dt
SET dt.ParenId = COALESCE(dt1.TransId, dt.TransId ) 
FROM  dbo.Members dt
LEFT JOIN dbo.Members dt1
    ON dt.StrippedBusName   = dt1.StrippedBusName
    AND  dt1.ParenId IS NOT NULL