Question

我有一个表，该表在两列中记录了重复的实体（a和b）。

UI的工作方式是，如果我转到a的页面，则在db中搜索重复项，如果找到一个重复项，则如图1所示插入一行。 1个显示。如果我导航到b的页面，将插入另一个副本并生成图2。

我无法触摸执行插入的代码。我需要能够过滤该表以仅返回图2中的一行（具有数百个具有此镜像数据的行）。

我尝试使用cte和自联接进行过滤，但是我认为没有一种过滤器适用于一种情况，而不适用于另一种情况。

例如像这样：

Select * from duplicates d1
join duplicates d2
on Entity != Duplicate

...只是过滤掉所有内容。我的猜测是解决方案在于行编号和一个过滤器，该过滤器会排除第1行以外的所有内容，但是我不确定如何正确地分组和分配行号来完成此操作。

图1

Entity     Duplicate
a          b

图2

Entity     Duplicate
a          b
b          a

Answer 1

您可以对两个值进行排序以首先显示两个值中的最小值。然后使用distinct可以避免重复：

Select distinct
       case when entity < duplicate then entity else duplicate end as col1,
       case when entity < duplicate then duplicate else entity end as col2
from   duplicates

或者，用union：

Select entity, duplicate
from   duplicates
where  entity < duplicate
union
Select duplicate, entity
from   duplicates
where  entity >= duplicate

Answer 2

最有效的方法通常是：

select d.*
from duplicates d
where d.entity < d.duplicate
union all
select d.*
from duplicates d
where d.entity > d.duplicate and
      not exists (select 1 from duplicates d2 where d2.entity = d.duplicate and d2.duplicate = d.entity);

这避免了使用group by或select distinct进行聚合。它还可以利用duplicates(entity, duplicate)上的索引。

Answer 3

您可以先找到重复项，然后将其过滤掉。

测试数据：

create table #testtb (
    entity varchar(10) not null,
    duplicate varchar(10) not null
);

insert into #testtb
values
('a', 'b'),
('b', 'a'),
('a', 'c'),
('c', 'b');

要查找重复项：

select a.*
from #testtb a
join #testtb b
on a.duplicate = b.entity
where a.entity = b.duplicate

但这会给出两者重复的行，而您想保留其中之一：

select a.*
from #testtb a
left join #testtb b
on a.duplicate = b.entity
where a.entity = b.duplicate and a.entity > a.duplicate

要从原始集中删除这些行：

select * 
from #testtb

except

select a.*
from #testtb a
left join #testtb b
on a.duplicate = b.entity
where a.entity = b.duplicate and a.entity > a.duplicate

筛选表，以便不返回已交换列重复项的行

3 个答案: