Question

我有一个SQL Server数据库，它有一个唯一的键列和49列数据元素（名称/地址/等......）。我有“重复”条目，但有不同的键，我想找到那些重复条目。

作为一个例子，我可能在表中有两次“John Smith”（其他47列信息）。两个John Smith条目都将具有不同的唯一键列，但除此之外，所有其他列将是相同的。如果列中的一个为NULL，则对于两个John Smith条目都将为NULL。

为了使事情复杂化，我需要将两个表连接在一起，然后一旦加入，找到数据元素（除了键之外的所有内容）相同的条目。

表1布局

MyKey, table2ID, Col1, Col2, Col3....Col46.

表2布局

ID, col47, col48, col49

Col1到Col49是“重复”数据的位置。

我尝试过类似下面的内容，几乎可以使用。如果我有NULL值，它会失败。例如，如果两个John Smith条目上的Col22为NULL（即它们都是相同的NULL值），则在选择中不会拾取它们。

问题：即使存在需要相互比较的NULL值，我如何才能获得类似下面的内容。

with MyJoinedTable as
(
    select PolicyNumber, col01, col02, col03......col49
    from table1
    inner join table2 on table2id = table2.id
)
select PolicyNumber, t1.col01, t1.col02, t1.col03.......t1.col49
from MyJoinedTable t1
inner join (select col01, col02, col03......col49
            from MyJoinedTable
            group by col01, col02, col03......col49
            having count(*) > 1) t2 
      on t1.col01 = t2.col01
      and t1.col02 = t2.col02
      .......
      and t1.col49 = t2.col49
order by t1.col01, t1.col02

Answer 1

一种方法是：

select t.*
from t
where exists (select 1
              from t t2
              where t2.col1 = t.col1 and
                    t2.col2 = t.col2 and
                    . . .
                    t2.policyNumber <> t.policyNumber
             );

这可以假设其他列都不是NULL。

编辑：

如果您使用的是SQL Server，我会这样做：

select t.*
from (select t.*,
             min(id) over (partition by col1, col2, . . . ) as min_id,
             max(id) over (partition by col1, col2, . . . ) as max_id
      from t
     ) t
where minid <> maxid;

Answer 2

在具有HAVING count(*) > 1的子查询中进行分组，并将其重新加入。

SELECT to1.policynumber,
       to1.col1,
       ...
       to1.col49
       FROM elbat to1
            INNER JOIN (SELECT ti.col1,
                               ...
                               ti.col49
                               FROM elbat ti
                               GROUP BY col1,
                                        ...
                                        col49
                               HAVING count(*) > 1) to2
                       ON to2.col1 = to1.col1
                          ...
                          AND to2.col49 = to1.col49;

或使用EXISTS。

SELECT to.policynumber,
       to.col1,
       ...
       to.col49
       FROM elbat to
       WHERE EXISTS (SELECT *
                            FROM elbat ti
                            WHERE ti.policynumber <> to.policynumber
                                  AND ti.col1 = to.col1
                                  ...
                                  AND ti.col49 = to.col49);

在SQL Server中查找重复记录，但也返回每个记录的唯一键集

2 个答案: