我的表格包含重复的电子邮件地址。每个电子邮件地址都有唯一的创建日期和唯一ID。我想确定具有最新创建日期及其关联ID的电子邮件地址,并显示重复ID及其创建日期。我希望查询以下列格式显示:
注意:有些情况下存在超过2个重复的电子邮件地址。我希望查询在新行上显示每个附加副本,在这些实例中重新说明EmailAddress和IDKeep。
无济于事我试图拼凑在这里找到的不同查询。我目前处于亏损状态 - 任何帮助/方向都会受到高度赞赏。
答案 0 :(得分:1)
复杂的查询最好通过将其分解为多个部分并逐步完成来解决。
首先让我们通过查找每封电子邮件的最新创建日期然后加入以获取ID来创建查询以查找我们要保留的行的键:
select x.Email, x.CreateDate, x.Id
from myTable x
join (
select Email, max(CreateDate) as CreateDate
from myTable
group by Email
) y on x.Email = y.Email and x.CreateDate = y.CreateDate
好的,现在让我们进行查询以获取重复的电子邮件地址:
select Email
from myTable
group by Email
having count(*) > 1
将此查询加回到表中以获取具有重复项的每一行的键:
select x.Email, x.Id, x.CreateDate
from myTable x
join (
select Email
from myTable
group by Email
having count(*) > 1
) y on x.Email = y.Email
大。现在剩下的就是将第一个查询与这个查询结合起来得到我们的结果:
select keep.Email, keep.Id as IdKeep, keep.CreateDate as CreateDateOfIdKeep,
dup.Id as DuplicateId, dup.CreateDate as CreateDateOfDuplicateId
from (
select x.Email, x.CreateDate, x.Id
from myTable x
join (
select Email, max(CreateDate) as CreateDate
from myTable
group by Email
) y on x.Email = y.Email and x.CreateDate = y.CreateDate
) keep
join (
select x.Email, x.Id, x.CreateDate
from myTable x
join (
select Email
from myTable
group by Email
having count(*) > 1
) y on x.Email = y.Email
) dup on keep.Email = dup.Email and keep.Id <> dup.Id
请注意,加入时的最终keep.Id <> dup.Id
谓词可确保我们不会为keep
和dup
获取相同的行。
答案 1 :(得分:0)
以下子查询使用技巧获取每封电子邮件的最新ID和创建日期:
select Email, max(CreateDate) as CreateDate,
substring_index(group_concat(id order by CreateDate desc), ',', 1) as id
from myTable
group by Email
having count(*) > 1;
having()
子句还确保这仅适用于重复的电子邮件。
然后,只需要将此查询与其余数据组合以获得所需的格式:
select t.Email, tkeep.id as keep_id, tkeep.CreateDate as keep_date,
id as dup_id, CreateDate as dup_CreateDate
from myTable t join
(select Email, max(CreateDate) as CreateDate,
substring_index(group_concat(id order by CreateDate desc), ',', 1) as id
from myTable
group by Email
having count(*) > 1
) tkeep
on t.Email = tkeep.Email and t.CreateDate <> tkeep.CreateDate;