Question

我的代码根据多行（具有不同ID）在各种其他列中具有相同值的事实来识别潜在的重复记录。这些信息经过人工审核，因此我并不担心丈夫和妻子可以合法地共享电子邮件地址。我正在使用的查询示例如下：

SELECT DISTINCT ID, Email 
FROM Customers 
WHERE Email IS NOT NULL AND Email != '' AND Email IN
    (SELECT Email FROM Customers GROUP BY Email HAVING COUNT(DISTINCT ID) > 1) 
ORDER BY Email;

这给了我这样的结果：

ID       Email
108      bob@hotmail.com
381      bob@hotmail.com
205      mary@gmail.com
772      mary@gmail.com
908      mary@gmail.com

这对我的目的很有用，除非我尝试通过电话号码进行匹配，电话号码有多个列（HomePhone，BusinessPhone，CellPhone）。这会产生两个问题 - 第一个，在本论坛上已经很好地记录，是如何识别三列中的任何一列包含匹配值的行（如果[第1行A，B或C列]中的值匹配[第2行A，B或C列]中的一列然后我要选择两行。第二个问题，我还没有想到，但没有找到答案，是如何选择[ID]，[匹配的值]作为我的输出。

我想我可以选择所有三列，并在我的程序中做一些进一步的代码魔术来理解它，但这阻止了我重用现有代码，也似乎是开发人员用来保持的黑客攻击类型从承认他需要DBA的帮助。（帮助！）但是，严肃地说，我很难找到一个优雅的解决方案，任何帮助都会受到赞赏。

Answer 1

根据我对这个问题的理解，

您最初可以使用union all并将不同的电话号码分成一列，group by该列，以查看是否有重复项。此后，在原始表上join获取客户ID。

with cnts as (
      select phone 
      from (select id,homephone phone from customers
            union all
            select id,businessphone from customers
            union all
            select id,cellphone from customers) x
      group by phone
      having count(distinct id) > 1
    )
select c.id,cn.phone value_matched
from customers c
join cnts cn on cn.phone in (c.homephone,c.businessphone,c.cellphone)     
order by 1,2

Answer 2

我会使用apply执行此操作：

select c.*, phone
from (select c.*, count(*) over (partition by phone) as cnt
      from customers c cross apply
           (select distinct v.phone
            from (values (homephone), (businessphone), (cellphone)
                 ) v(phone)
            where v.phone is not null
           ) v(phone)
     ) c
where cnt > 1
order by phone;

最里面的子查询为每个客户选择不同的电话。 count(*) over . . .然后计算phone出现的次数（由于不同客户的不同）。最终where选择为多个客户展示的手机。

在几列中的任何一列中查找重复条目

2 个答案: