我正在尝试识别重复的电子邮件地址。每个电子邮件地址都有一个唯一的ID,我想根据多个条件识别要保留的ID(主要)。我在下面构建了查询,但它没有产生预期的结果。理想情况下,我想识别重复的电子邮件地址,基于以下标准的主要ID(按重要性顺序),并列出重复项以及相关详细信息。如果重复项绑定在一个优先级阶段,则查询应移至下一个优先级阶段,依此类推以确定如何从辅助ID定义主数据库。可能存在多个辅助ID,因此必须在每行上重新声明主ID,并且存在辅助ID。
虽然我能够生成重复的电子邮件地址列表,但似乎定义哪个主要ID不正确的逻辑。
select
primary.email as primaryEmail,
primary.id as primaryID,
primary.entrydate as primaryentrydate,
primary.lastcontactdate as primarylastcontactdate,
primary.lastactivitydate as primarylastactivitydate,
primary.status as primarystatus,
primary.contacttype as primarycontacttype,
secondary.id as secondaryID,
secondary.entrydate as secondaryentrydate,
secondary.lastcontactdate as secondarylastcontactdate,
secondary.lastactivitydate as secondarylastactivitydate,
secondary.status as secondarystatus,
secondary.contacttype as secondarycontacttype
from (
select
x.email,
x.entrydate,
x.id,
x.lastcontactdate,
x.lastactivitydate,
x.status,
x.contacttype
from
mytable x
join (
select
email,
entrydate,
lastcontactdate,
lastactivitydate,
status,
contacttype,
row_number() over (partition by email
order by (case
when status IN ('Urgent') then 1
when status IN ('High') then 2
when status IN ('Medium','Medium-Low') then 3
when status IN ('Low','Low-Low') then 4
when status IN ('') then 5
else 6 end),
coalesce(max(lastcontactdate),lastcontactdate),
coalesce(max(lastactivitydate),lastactivitydate),
(case
when contacttype IN ('Contacted','contacted','CONTACTED') then 1
when contacttype IN ('Incorrect Information') then 2
when contacttype IN ('NOT CONTACTED','Not Contacted','not contacted') then 3
when contacttype IN ('Tried','tried','TRIED') then 4
else 5 end),
entrydate desc)
from
mytable
group by
email,
status,
lastcontactdate,
lastactivitydate,
contacttype,
entrydate
) y on x.email = y.email and x.lastcontactdate = y.lastcontactdate
) primary
join (
select
x.email,
x.id,
x.entrydate,
x.lastactivitydate,
x.status,
x.lastcontactdate,
x.contacttype
from
mytable x
join (
select
email
from
mytable
group by
email
having count(*) > 1
) y on x.email = y.email
) secondary on primary.email = secondary.email and primary.id <> secondary.id and primary.email is not null
&#13;