Question

我正在处理表中的一组数据。为简单起见，我有如下表和一些示例数据：

此表中的某些数据来自不同的来源，例如具有cqmRecordID != null的数据

我需要在此表中找到重复的值，并删除从其他来源（带有cqmRecordID的那些）传来的重复值如果这些列的值相同，则认为该记录是重复的：

[名称]
发布（[CreatedDate]作为日期）
[CreatedBy]

因此在我上面的示例数据中，记录＃5和记录＃6将被视为重复记录。

作为解决方案，我提出了以下两个查询：

查询＃1：

 select * from (
  select recordid, cqmrecordid, ROW_NUMBER() over (partition by name, cast(createddate as date), createdby 
                                                   order by cqmrecordid, recordid) as rownum
  from vmsNCR  ) A
  where cqmrecordid is not null   
  order by recordid

查询2：

  select A.recordID, A.cqmRecordID, B.RecordID, B.cqmRecordID 
  from vmsNCR A 
  join vmsNCR B
    on A.Name = B.Name 
    and cast(A.CreatedDate as date) = cast(B.CreatedDate as date) 
    and A.CreatedBy = B.CreatedBy
    and A.RecordID != B.RecordID 
    and A.cqmRecordID is not null 
  order by A.RecordID

是否有更好的方法？一个比另一个在性能上更好吗？

Answer 1

如果要获取所有没有重复的行，则：

select t.*  -- or all columns except seqnum
from (select t.*,
             row_number() over (partition by name, cast(createddate as date), createdby
                                order by (case when cqmRecordId is not null then 1 else 2 end)
                               ) as seqnum
      from t
     ) t
where seqnum = 1;

如果要提高性能，请先创建一个列，然后创建一个索引：

alter table t add cqmRecordId_flag as (case when cqmRecordId is null then 0 else 1 end) persisted;
alter table t add createddate_date as (cast(createddate as date)) persisted;

然后是一个索引：

create index idx_t_4 on t(name, createddate_date, createdby, cqmRecordId_flag desc);

编辑：

如果您实际上只想从表中删除NULL值，则可以使用：

delete t from t
    where t.cqmRecordId is null and
          exists (select 1
                  from t t2
                  where t2.name = t.name and
                        convert(date, t2.createddate_date) =convert(date, t.createddate_date) and
                        t2.createdby = t.createdby and
                        t2.cqmRecordId is not null
                 );

您可以对select使用相同的逻辑来选择重复项。

Answer 2

使用以下代码消除重复

;WITH CTE
AS
(
   SELECT ROW_NUMBER() OVER(
              PARTITION BY [Name],Cast([CreatedDate] as Date),[CreatedBy] 
              ORDER BY cqmRecordId
           ) AS Rnk
   ,*
)
DELETE FROM CTE
WHERE Rnk <> 1

Answer 3

在下面尝试使用“查询它可能对您有用”

;WITH TestCTE
AS
(
   SELECT *,ROW_NUMBER() OVER(
              PARTITION BY [Name],Cast([CreatedDate] as Date),[CreatedBy] 
              ORDER BY RecordId
            ) AS RowNumber
)
DELETE FROM TestCTE
WHERE RowNumber > 1

在所有列都不相同的表中查找重复值

3 个答案: