Question

我在SQL（SSMS）中有一个表，如下所示：

Group|item
1|desk
1|phone
1|book
2|desk
2|phone
3|desk
3|phone
3|book
4|Desk
4|phone
4|laptop

我要删除该组中所有项目都存在于另一个组中的任何组。如果有2个或更多的组都具有完全相同的项目，那么我只想保留该组的一个实例，然后删除其他实例。

在上面的示例表中，我仅保留组1和4，因为组2中的所有项目已经存在于组1中，而组3只是组3的副本。

有没有简单的方法可以实现这一目标？我目前有一个解决方案，将上面的表选择到一个临时表中，将该表连接到其中的group！= group上，在右侧表中获得不同的项目数，计算这些项目匹配的实例数，如果两者匹配数字相同，因此我删除了该组。（因为这会显示该组中的所有项目都在左侧的组中）

此解决方案的问题是，通过将表内部连接到组号不匹配的自身上，我必须创建一个表，该表的行数（x ^ 2）-x和实表数我要处理的行数超过30,000，我希望不要创建包含约90亿行的表。

还要注意，我有成千上万种不同的物品。

Answer 1

我会使用NOT EXISTS：

select distinct t.group
from table t
where not exists (select 1 from table t1 where t1.group < t.group and t1.item = t.item);

group是SQL Server的保留关键字，因此不建议使用group进行列命名。

Answer 2

不存在

  with cte as
    (
    select * from (
    select 1 as grp,'desk' as item union all
    select 1,'phone' union all
    select 1,'|book' union all
    select 2,'desk' union all
    select 2,'phone' union all
    select 3,'desk' union all
    select 3,'phone' union all
    select 3,'|book' union all
    select 4,'Desk' union all
    select 4,'phone' union all
    select 4,'laptop'
    ) t
    ) 
    select distinct t1.grp
    from cte t1
    where not exists (select 1 from cte t2 where t2.grp < t1.grp and t2.item = t1.item);

Answer 3

这很复杂。您可以通过执行以下操作获得等效的组：

select grp, min(contained_in_group)
from (select t1.grp, t2.grp as contained_in_group
      from tt t1  join
           t t2
           on t1.item = t2.item 
      group by t1.grp, t2.grp, t1.num_grp
      having count(*) = count(t2.item) and count(*) = t1.num_grp
      ) x
group by grp;

您可以看到其中的rextester。

您想要的实际结果是：

select distinct min(contained_in_group)
from (select t1.grp, t2.grp as contained_in_group
      from tt t1  join
           t t2
           on t1.item = t2.item 
      group by t1.grp, t2.grp, t1.num_grp
      having count(*) = count(t2.item) and count(*) = t1.num_grp
      ) x
group by grp;

SSMS-删除重复的分组值

3 个答案: