根据多个标准选择重复项

时间:2015-05-05 12:55:40

标签: sql postgresql duplicates

我正在尝试识别重复的电子邮件地址。每个电子邮件地址都有一个唯一的ID,我想根据多个条件识别要保留的ID(主要)。我在下面构建了查询,但它没有产生预期的结果。理想情况下,我想识别重复的电子邮件地址,基于以下标准的主要ID(按重要性顺序),并列出重复项以及相关详细信息。如果重复项绑定在一个优先级阶段,则查询应移至下一个优先级阶段,依此类推以确定如何从辅助ID定义主数据库。可能存在多个辅助ID,因此必须在每行上重新声明主ID,并且存在辅助ID。

  • 状态(将某些状态优先于其他状态)
  • 最后联系日期(选择最长日期,如果没有重复项有最后联系日期,则转到下一个优先级)
  • 上次活动日期(选择最长日期,如果没有重复项有最后联系日期,则转到下一个优先级)
  • 联系人类型(将某些状态优先于其他状态)
  • 录入日期(按照顺序排列优先顺序)

虽然我能够生成重复的电子邮件地址列表,但似乎定义哪个主要ID不正确的逻辑。



select 
	primary.email as primaryEmail,
	primary.id as primaryID, 
	primary.entrydate as primaryentrydate,
	primary.lastcontactdate as primarylastcontactdate,
	primary.lastactivitydate as primarylastactivitydate,
	primary.status as primarystatus,
	primary.contacttype as primarycontacttype,
    secondary.id as secondaryID, 
    secondary.entrydate as secondaryentrydate,
    secondary.lastcontactdate as secondarylastcontactdate,
    secondary.lastactivitydate as secondarylastactivitydate,
	secondary.status as secondarystatus,
	secondary.contacttype as secondarycontacttype
from (
    select 
    	x.email, 
    	x.entrydate,
    	x.id,
    	x.lastcontactdate,
    	x.lastactivitydate,
    	x.status,
    	x.contacttype
    from 
    	mytable x
    join (
        select
        	email,
        	entrydate,
        	lastcontactdate,
        	lastactivitydate,
        	status,
        	contacttype,
        	row_number() over (partition by email
			  order by (case
				when status IN ('Urgent') then 1
				when status IN ('High') then 2
				when status IN ('Medium','Medium-Low') then 3
				when status IN ('Low','Low-Low') then 4
				when status IN ('') then 5
				else 6 end),
			    coalesce(max(lastcontactdate),lastcontactdate),
 coalesce(max(lastactivitydate),lastactivitydate),
				(case
				when contacttype IN ('Contacted','contacted','CONTACTED') then 1
				when contacttype IN ('Incorrect Information') then 2
				when contacttype IN ('NOT CONTACTED','Not Contacted','not contacted') then 3
				when contacttype IN ('Tried','tried','TRIED') then 4
				else 5 end),
				entrydate desc)
			from
				mytable
			group by 
				email,
				status,
				lastcontactdate,
				lastactivitydate,
				contacttype,
				entrydate
    ) y on x.email = y.email and x.lastcontactdate = y.lastcontactdate
) primary
join (
    select 
    	x.email, 
    	x.id, 
    	x.entrydate,
    	x.lastactivitydate,
    	x.status,
    	x.lastcontactdate,
    	x.contacttype
    from 
    	mytable x
    join (
        select 
        	email
        from 
        	mytable
        group by 
        	email
        having count(*) > 1
    ) y on x.email = y.email
) secondary on primary.email = secondary.email and primary.id <> secondary.id and primary.email is not null
&#13;
&#13;
&#13;

0 个答案:

没有答案