在分组的唯一数据集上选择id

时间:2013-06-06 13:06:21

标签: sql sqlite

我的问题只显示分组的唯一数据集的ID。一个简单的例子将是最好的:

| id | color |
--------------
| 1  | red   |
--------------
| 1  | green |
--------------
| 1  | blue  |
--------------
| 2  | red   |
--------------
| 2  | green |
--------------
| 2  | blue  |
--------------
| 3  | red   |
--------------
| 3  | blue  |
--------------
| 3  | yellow|
--------------
| 3  | purple|
--------------

Id 1和id 2具有相同的数据子集(红色,绿色,蓝色),因此结果表应仅包含1 OR 2:

| id |
------
| 1  |
------
| 3  | 
------

我想这个相对基本的问题被多次询问,但我无法确定会产生结果的具体关键词。

2 个答案:

答案 0 :(得分:1)

虽然SQLite有group_concat(),但在这里没有用,因为连接元素的顺序是任意的。这是最简单的方法。

相反,我们必须考虑这种关系。我们的想法是做到以下几点:

  1. 计算两个ID共有的颜色数
  2. 计算每个ID上的颜色数
  3. 选择这三个值相等的ID对
  4. 按对中的最小ID识别每对
  5. 然后,最小值的不同值是您想要的列表。

    以下查询采用这种方法:

    select distinct MIN(id2)
    from (select t1.id as id1, t2.id as id2, count(*) as cnt
          from t t1 join
               t t2
               on t1.color = t2.color
          group by t1.id, t2.id
         ) t1t2 join
         (select t.id, COUNT(*) as cnt
          from t
          group by t.id
         ) t1sum
         on t1t2.id1 = t1sum.id and t1sum.cnt = t1t2.cnt join
         (select t.id, COUNT(*) as cnt
          from t
          group by t.id
         ) t2sum
         on t1t2.id2 = t2sum.id and t2sum.cnt = t1t2.cnt
    group by t1t2.id1, t1t2.cnt, t1sum.cnt, t2sum.cnt
    

    我实际上是通过在前面放置这个with子句在SQL Server中对此进行了测试:

    with t as (
          select 1 as id, 'r' as color union all
          select 1, 'g' union all
          select 1, 'b' union all
          select 2 as id, 'r' as color union all
          select 2, 'g' union all
          select 2, 'b' union all
          select 3, 'r' union all
          select 4, 'y' union all
          select 4, 'p' union all
          select 5 as id, 'r' as color union all
          select 5, 'g' union all
          select 5, 'b' union all
          select 5, 'p'
         )
    

答案 1 :(得分:1)

SQL是面向集合的,所以让我们试试这个:

唯一ID是不存在具有相同颜色集的其他ID的ID。

要确定两个ID是否具有相同的颜色集,我们彼此subtract(这是EXCEPT所做的)并测试结果在两个方向上是否为空:

SELECT id
FROM (SELECT DISTINCT id FROM t) AS t1
WHERE NOT EXISTS (SELECT id FROM (SELECT DISTINCT id FROM t) AS t2
                  WHERE t2.id < t1.id
                    AND NOT EXISTS (SELECT color FROM t WHERE id = t1.id
                                    EXCEPT
                                    SELECT color FROM t WHERE id = t2.id)
                    AND NOT EXISTS (SELECT color FROM t WHERE id = t2.id
                                    EXCEPT
                                    SELECT color FROM t WHERE id = t1.id));

SQL Fiddle