SQL Query需要匹配类似的记录

时间:2013-05-24 08:55:33

标签: sql sql-server-2008

我有一个非常大的联系人表,我正在建立一个界面来帮助我的客户重复数据删除。以下是表格内容的示例

id | firstname | lastname | email            | address1 | addres2 | verifiedAt |
1  | James     | johnson  | james@test.com   |          |         |            | 
2  | David     | bloggs   | james@bloggs.com |          |         |            |
3  | John      | nobel    | james@nobel.com  |          |         |            |
4  | Terry     | jacket   | james@jacket.com |          |         | 05/05/2013 |
5  | James     | johnson  | james@johnson.com|          |         |            |
6  | James     | privett  | james@test.com   |          |         |            |

我需要编写一个查询,该查询将返回第一个联系人,该联系人在同一个表中有另一个联系人,其中电子邮件地址匹配或名字+姓氏匹配。

这可以在一个查询中实现吗?

提前致谢

3 个答案:

答案 0 :(得分:2)

试试这个(SQL Fiddle)。

SELECT DISTINCT *
FROM
(      SELECT 
           MIN(id) as [id]
        FROM mytable
        GROUP BY email
        HAVING COUNT(*) > 1
        UNION ALL
      SELECT
          MIN(id) as [id]
        FROM mytable
        GROUP BY firstName,lastName
        HAVING Count(*) > 1 )dups
JOIN myTable t
ON t.Id = dups.id

答案 1 :(得分:1)

这有效(SQLFiddle DEMO):

SELECT a.* FROM mytable a
JOIN (
    SELECT email
    FROM mytable
    GROUP BY email
    HAVING count(*) > 1
) b ON a.email = b.email
UNION
SELECT a.* FROM mytable a
JOIN (
    SELECT firstname, lastname
    FROM mytable
    GROUP BY firstname, lastname
    HAVING count(*) > 1
) b ON a.firstname = b.firstname AND a.lastname = b.lastname

为确保此查询能够快速运行,请确保至少包含以下索引:

 CREATE INDEX i1 ON mytable(email);
 CREATE INDEX i2 ON mytable(firstname, lastname);

答案 2 :(得分:0)

一种方法:

with cte as 
(select c.*,
        row_number() over (partition by email order by id) rnem,
        count(*) over (partition by email) ctem,
        row_number() over (partition by firstname, lastname order by id) rnfl,
        count(*) over (partition by firstname, lastname) ctfl
 from contacts c)
select * from cte
where (ctem > 1 and rnem = 1) or (ctfl > 1 and rnfl = 1)

SQLFiddle here