T-SQL重复记录

时间:2016-03-10 21:14:30

标签: sql-server tsql

我试图删除所有其他重复的记录我的select查询返回每隔一个记录副本(tblPoints.ptUser_ID)是唯一的id

SELECT *, u.usMembershipID
  FROM [ABCRewards].[dbo].[tblPoints]
  inner join tblUsers u on u.User_ID = tblPoints.ptUser_ID
  where ptUser_ID in (select user_id from tblusers where Client_ID = 8)
  and ptCreateDate >= '3/9/2016'
  and ptDesc = 'December Anniversary'

2 个答案:

答案 0 :(得分:1)

通常INNER JOIN返回的重复项表明查询存在问题,但如果您确定您的加入是正确的,那么这样就可以了:

;WITH CTE
     AS (SELECT *
              , ROW_NUMBER() OVER(PARTITION BY t.ptUser_ID ORDER BY t.ptUser_ID) AS rn
         FROM [ABCRewards].[dbo].[tblPoints] AS t)

/*Uncomment below to Review duplicates*/
     --SELECT *
     --FROM CTE
     --WHERE rn > 1;

/*Uncomment below to Delete duplicates*/
    --DELETE 
    --FROM CTE
    --WHERE rn > 1;

答案 1 :(得分:1)

清理数据重复时,我总是使用相同的查询模式删除所有副本并保留所需的副本(原始的,最新的,无论如何)。以下查询模式删除所有重复项并保留您希望保留的副本。

只需用表和字段替换所有[]。

  • [Field(s)ToDetectDuplications]:在这里放置一些字段,这些字段允许您在具有相同值时表示它们是重复的。

  • [Field(s)ToChooseWhichDupplicationIsKept]:在这里添加一个字段来选择保留哪个dupplicate。例如,一个与 最大的价值或较旧的价值。

DELETE [YourTableName]
FROM [YourTableName]
INNER JOIN (SELECT [YourTablePrimaryKey],
                   I = ROW_NUMBER() OVER(PARTITION BY [Field(s)ToDetectDuplications] ORDER BY [Field(s)ToChooseWhichDupplicationIsKept ] DESC)
            FROM [dbo].[YourTableName]) AS T ON [YourTableName].[YourTablePrimaryKey] = T.[YourTablePrimaryKey]
                                                 AND T.I > 1

我建议您查看以前删除的内容。为此,只需将“delete”语句替换为“select”,而不是像下面那样。

SELECT  T.I,
        [YourTableName].*
FROM [YourTableName]
INNER JOIN (SELECT [YourTablePrimaryKey],
                   I = ROW_NUMBER() OVER(PARTITION BY [Field(s)ToDetectDuplications] ORDER BY [Field(s)ToChooseWhichDupplicationIsKept ] DESC)
            FROM [dbo].[YourTableName]) AS T ON [YourTableName].[YourTablePrimaryKey] = T.[YourTablePrimaryKey]
                                                 AND T.I > 1

说明:
这里我们使用“row_number()”,“Partition by”和“Order by”来检测重复项。 “分区”将所有行组合在一起。设置分区字段,以便在数据正确时为每个分区分配一行。这样就会出现错误数据,分区有多行。 Row_number为它们分配一个数字。当数字大于1时,则表示该分区存在重复。 “order by”用于告诉“row_number”以什么顺序为它们分配一个数字。保留1号,删除所有其他。

以OP的架构和规范为例
在这里,我试图用你对数据库模式的猜测来填充模式。

DECLARE @userID INT
SELECT @userID = 8

SELECT  T.I,
        [ABCRewards].[dbo].[tblPoints].*
FROM [ABCRewards].[dbo].[tblPoints]
INNER JOIN (SELECT [YourTablePrimaryKey],
                   I = ROW_NUMBER() OVER(PARTITION BY T.ptDesc, T.ptUser_ID  ORDER BY ptCreateDate DESC)
            FROM [ABCRewards].[dbo].[tblPoints]
            WHERE T.ptCreateDate >= '3/9/2016'
            AND T.ptDesc = 'December Anniversary'
            AND T.ptUser_ID = @userID
            ) AS T ON [ABCRewards].[dbo].[tblPoints].[YourTablePrimaryKey] = T.[YourTablePrimaryKey]
                                                 AND T.I > 1