SQL中交集的补充

时间:2014-10-14 12:46:40

标签: sql oracle join intersection

我正在使用Oracle SQL,我有一个关于join命令的基本问题。

我有5张桌子。它们中的每一个都与主键具有相同的列:ID (int)。让我们看看以下查询:

select count(*) from table_a - 100 records
select count(*) from table_c - 200 records
select count(*) from table_c - 150 records
select count(*) from table_d - 100 records
select count(*) from table_e - 120 records

select * -- 88 records
 from table_a a
  inner join table b
    on a.id = b.id
  inner join table c
    on a.id = c.id
  inner join table d
    on a.id = d.id
  inner join table e
    on a.id = e.id

在这种情况下,如果其中一个表不包含某个ID(即使包含其余的ID),许多记录也无法包含在输出中。我怎么知道这些“坏”记录是什么?它实际上是我认为的交叉点的补充。

我想知道每个案例的问题记录和表格是什么。例如:ID 123是一个“坏”记录,因为它不包含在table_c中,但包含在其余表中。 ID 321是一个有问题的记录,因为它包含在除table_d之外的所有表中。

5 个答案:

答案 0 :(得分:6)

您可能正在寻找所有牌桌之间的symmetric difference

要解决这类问题而不太聪明,您需要FULL OUTER JOIN ... USING

SELECT id
    FROM table_a
    FULL OUTER JOIN table_b USING(id) 
    FULL OUTER JOIN table_c USING(id) 
    FULL OUTER JOIN table_d USING(id) 
    FULL OUTER JOIN table_e USING(id) 
WHERE table_a.ROWID IS NULL
   OR table_b.ROWID IS NULL
   OR table_c.ROWID IS NULL
   OR table_d.ROWID IS NULL
   OR table_e.ROWID IS NULL;

FULL OUTER JOIN将返回满足连接条件的所有行(如普通JOIN)以及没有相应行的所有行。 USING子句在equijoin列上嵌入了隐式COALESCE


另一种选择是使用anti-join

SELECT id
    FROM table_a
    FULL OUTER JOIN table_b USING(id) 
    FULL OUTER JOIN table_c USING(id) 
    FULL OUTER JOIN table_d USING(id) 
    FULL OUTER JOIN table_e USING(id) 
WHERE id NOT IN (
    SELECT id
        FROM table_a
        INNER JOIN table_b USING(id) 
        INNER JOIN table_c USING(id) 
        INNER JOIN table_d USING(id) 
        INNER JOIN table_e USING(id) 
)

基本上,这将构建联合所有集合减去所有集合的交集。

从图形上,您可以比较INNER JOINOUTER JOIN(仅限于3个表格以便于表示):

INNER JOIN FULL OUTER JOIN


考虑到测试用例:

ID    TABLE_A TABLE_B TABLE_C TABLE_D TABLE_E
1     *       -       -       -       -
2     -       *       *       *       *
3     *       -       -       *       -
4     *       *       *       *       *
     表*缺少条目中的

-

两个查询都会产生:

ID
1
3
2

如果您需要表格结果,可以通过添加一堆CASE表达式来调整其中一个查询。这样的事情:

SELECT ID,
    CASE when table_a.rowid is not null then 1 else 0 END table_a,
    CASE when table_b.rowid is not null then 1 else 0 END table_b,
    CASE when table_c.rowid is not null then 1 else 0 END table_c,
    CASE when table_d.rowid is not null then 1 else 0 END table_d,
    CASE when table_e.rowid is not null then 1 else 0 END table_e
FROM table_a
    FULL OUTER JOIN table_b USING(id) 
    FULL OUTER JOIN table_c USING(id) 
    FULL OUTER JOIN table_d USING(id) 
    FULL OUTER JOIN table_e USING(id) 
WHERE table_a.ROWID IS NULL
   OR table_b.ROWID IS NULL
   OR table_c.ROWID IS NULL
   OR table_d.ROWID IS NULL
   OR table_e.ROWID IS NULL;

产:

ID    TABLE_A TABLE_B TABLE_C TABLE_D TABLE_E
1     1       0       0       0       0
3     1       0       0       1       0
2     0       1       1       1       1
     表1缺少条目中的

0

答案 1 :(得分:2)

您可以尝试以下查询

 SELECT id, COUNT(id) as id_num FROM (
 SELECT id FROM table_a
 UNION
 SELECT id FROM table_b
 UNION
 SELECT id FROM table_c
 UNION
 SELECT id FROM table_d
 UNION
 SELECT id FROM table_e
 ) 
GROUP BY id HAVING id_num <5

答案 2 :(得分:1)

试试这个:

        SELECT id FROM (
SELECT id FROM table_a
UNION
SELECT id FROM table_b
UNION
SELECT id FROM table_c
UNION
SELECT id FROM table_d
UNION
SELECT id FROM table_e
) result
WHERE id NOT IN ( select a.id from table_a a
        inner join table_b b
        on a.id = b.id
        inner join table_c c
        on a.id = c.id
        inner join table_d d
        on a.id = d.id
        inner join table_e e
        on a.id = e.id ) 

答案 3 :(得分:0)

如果我理解正确,您可以使用外连接来确定哪些行没有匹配的主键(或唯一键)。例如,在以下示例中使用左连接查找表b中的不匹配行:

select a.id from a left join b on a.id=b.id where b.id is null

相反,在表a中找到不匹配的行:

select b.id from a right join b on a.id=b.id where a.id is null

答案 4 :(得分:0)

此解决方案将告诉您哪些表缺少每个ID的行:

SELECT   *
FROM     (SELECT id, 'table_a' AS table_name FROM table_a
          UNION ALL
          SELECT id, 'table_b' FROM table_b
          UNION ALL
          SELECT id, 'table_c' FROM table_c
          UNION ALL
          SELECT id, 'table_d' FROM table_d
          UNION ALL
          SELECT id, 'table_c' FROM table_e) PIVOT (COUNT (*)
                                             FOR table_name
                                             IN  ('table_a' AS table_a,
                                                 'table_b' AS table_b,
                                                 'table_c' AS table_c,
                                                 'table_d' AS table_d,
                                                 'table_e' AS table_e))
WHERE    table_a + table_b + table_c + table_d + table_e < 5
ORDER BY id

示例结果:

ID | TABLE_A | TABLE_B | TABLE_C | TABLE_D | TABLE_E
0  |       1 |       0 |       0 |       1 |       0
1  |       0 |       1 |       0 |       1 |       0
2  |       1 |       1 |       0 |       0 |       0