我有一个非常大的联系人表,我正在建立一个界面来帮助我的客户重复数据删除。以下是表格内容的示例
id | firstname | lastname | email | address1 | addres2 | verifiedAt |
1 | James | johnson | james@test.com | | | |
2 | David | bloggs | james@bloggs.com | | | |
3 | John | nobel | james@nobel.com | | | |
4 | Terry | jacket | james@jacket.com | | | 05/05/2013 |
5 | James | johnson | james@johnson.com| | | |
6 | James | privett | james@test.com | | | |
我需要编写一个查询,该查询将返回第一个联系人,该联系人在同一个表中有另一个联系人,其中电子邮件地址匹配或名字+姓氏匹配。
这可以在一个查询中实现吗?
提前致谢
答案 0 :(得分:2)
试试这个(SQL Fiddle)。
SELECT DISTINCT *
FROM
( SELECT
MIN(id) as [id]
FROM mytable
GROUP BY email
HAVING COUNT(*) > 1
UNION ALL
SELECT
MIN(id) as [id]
FROM mytable
GROUP BY firstName,lastName
HAVING Count(*) > 1 )dups
JOIN myTable t
ON t.Id = dups.id
答案 1 :(得分:1)
这有效(SQLFiddle DEMO):
SELECT a.* FROM mytable a
JOIN (
SELECT email
FROM mytable
GROUP BY email
HAVING count(*) > 1
) b ON a.email = b.email
UNION
SELECT a.* FROM mytable a
JOIN (
SELECT firstname, lastname
FROM mytable
GROUP BY firstname, lastname
HAVING count(*) > 1
) b ON a.firstname = b.firstname AND a.lastname = b.lastname
为确保此查询能够快速运行,请确保至少包含以下索引:
CREATE INDEX i1 ON mytable(email);
CREATE INDEX i2 ON mytable(firstname, lastname);
答案 2 :(得分:0)
一种方法:
with cte as
(select c.*,
row_number() over (partition by email order by id) rnem,
count(*) over (partition by email) ctem,
row_number() over (partition by firstname, lastname order by id) rnfl,
count(*) over (partition by firstname, lastname) ctfl
from contacts c)
select * from cte
where (ctem > 1 and rnem = 1) or (ctfl > 1 and rnfl = 1)
SQLFiddle here。