我正在处理表中的一组数据。 为简单起见,我有如下表和一些示例数据:
此表中的某些数据来自不同的来源,例如具有cqmRecordID != null
的数据
我需要在此表中找到重复的值,并删除从其他来源(带有cqmRecordID的那些)传来的重复值 如果这些列的值相同,则认为该记录是重复的:
因此在我上面的示例数据中,记录#5和记录#6将被视为重复记录。
作为解决方案,我提出了以下两个查询:
查询#1:
select * from (
select recordid, cqmrecordid, ROW_NUMBER() over (partition by name, cast(createddate as date), createdby
order by cqmrecordid, recordid) as rownum
from vmsNCR ) A
where cqmrecordid is not null
order by recordid
查询2:
select A.recordID, A.cqmRecordID, B.RecordID, B.cqmRecordID
from vmsNCR A
join vmsNCR B
on A.Name = B.Name
and cast(A.CreatedDate as date) = cast(B.CreatedDate as date)
and A.CreatedBy = B.CreatedBy
and A.RecordID != B.RecordID
and A.cqmRecordID is not null
order by A.RecordID
是否有更好的方法?一个比另一个在性能上更好吗?
答案 0 :(得分:1)
如果要获取所有没有重复的行,则:
select t.* -- or all columns except seqnum
from (select t.*,
row_number() over (partition by name, cast(createddate as date), createdby
order by (case when cqmRecordId is not null then 1 else 2 end)
) as seqnum
from t
) t
where seqnum = 1;
如果要提高性能,请先创建一个列,然后创建一个索引:
alter table t add cqmRecordId_flag as (case when cqmRecordId is null then 0 else 1 end) persisted;
alter table t add createddate_date as (cast(createddate as date)) persisted;
然后是一个索引:
create index idx_t_4 on t(name, createddate_date, createdby, cqmRecordId_flag desc);
编辑:
如果您实际上只想从表中删除NULL
值,则可以使用:
delete t from t
where t.cqmRecordId is null and
exists (select 1
from t t2
where t2.name = t.name and
convert(date, t2.createddate_date) =convert(date, t.createddate_date) and
t2.createdby = t.createdby and
t2.cqmRecordId is not null
);
您可以对select
使用相同的逻辑来选择重复项。
答案 1 :(得分:0)
使用以下代码消除重复
;WITH CTE
AS
(
SELECT ROW_NUMBER() OVER(
PARTITION BY [Name],Cast([CreatedDate] as Date),[CreatedBy]
ORDER BY cqmRecordId
) AS Rnk
,*
)
DELETE FROM CTE
WHERE Rnk <> 1
答案 2 :(得分:0)
在下面尝试使用“查询它可能对您有用”
;WITH TestCTE
AS
(
SELECT *,ROW_NUMBER() OVER(
PARTITION BY [Name],Cast([CreatedDate] as Date),[CreatedBy]
ORDER BY RecordId
) AS RowNumber
)
DELETE FROM TestCTE
WHERE RowNumber > 1