Oracle FULL OUTER JOIN具有两个条件的三个表

时间:2018-08-21 13:13:27

标签: sql oracle oracle12c

背景

Oracle DB版本:

SELECT * FROM v$version
WHERE banner LIKE 'Oracle%';
-- OUTPUT
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

目标

我正在尝试使用两个条件三个表进行外部联接,以便缺少的值仅显示为NULL。请参阅下面的详细信息。

表格

下表是抽象的,因此请不要尝试改善数据模型本身。

测量

主键= ID

|  ID  |    MEAS_NAME    |
|------|-----------------|
| 1000 | "Measurement 1" |

MEASUREMENT_AREA

主键=(IDNAME
外键ID = MEASUREMENT.ID

|  ID  |    NAME   | AREA |
|------|-----------|------|
| 1000 | "Point 1" |   10 |
| 1000 | "Point 2" |   20 |

MEASUREMENT_VOLUME

主键=(IDNAME
外键ID = MEASUREMENT.ID

|  ID  |    NAME   | VOLUME |
|------|-----------|--------|
| 1000 | "Point 1" |    100 |
| 1000 | "Point 3" |    200 |

预期结果

我想要的是以下输出:

|  ID  |    MEAS_NAME    |    NAME   | AREA | VOLUME |
|------|-----------------|-----------|------|--------|
| 1000 | "Measurement 1" | "Point 1" | 10   | 100    |
| 1000 | "Measurement 1" | "Point 2" | 20   | NULL   |
| 1000 | "Measurement 1" | "Point 3" | NULL | 200    |

这意味着,如果对于特定的MEASUREMENT.ID和特定的NAMEAREAVOLUME中都有数据,则将它们放在同一行中。否则,只需将AREAVOLUME字段留空。

查询1

我想出了以下SQL语句,该语句不起作用,它丢弃了MEASUREMENT_VOLUME中的结果:

SELECT meas.ID AS "ID",
    meas.MEAS_NAME AS "MEAS_NAME",
    COALESCE (area.NAME, vol.NAME) as "NAME",
    area.AREA, vol.VOLUME
FROM MEASUREMENT meas
  LEFT JOIN MEASUREMENT_AREA area
    ON meas.ID = area.ID
  FULL JOIN MEASUREMENT_VOLUME vol
    ON meas.ID = vol.ID AND area.NAME = vol.NAME
WHERE meas.ID = 1000;

查询2

如果我将MEASUREMENT放在最后,则可以,但是查询非常慢

SELECT meas.ID AS "ID",
    meas.MEAS_NAME AS "MEAS_NAME",
    COALESCE (area.NAME, vol.NAME) as "NAME",
    area.AREA, vol.VOLUME
FROM MEASUREMENT_AREA area
    FULL JOIN MEASUREMENT_VOLUME vol
        ON area.ID = vol.ID AND area.NAME = vol.NAME
    JOIN MEASUREMENT meas
        ON meas.ID = vol.ID OR meas.ID = area.ID
WHERE meas.ID = 1000;

问题

  • 为什么查询1不起作用?
  • 查询2为什么起作用?
  • 实现输出的最有效方法是什么?

非常感谢您的帮助,我不是SQL专家。

其他信息

  • MEASUREMENT中的一行包含用于一次测量的元数据
  • 一个测量可以包含数百个测量点,这些测量点以其“名称”来区分。
  • MEASUREMENT_AREAMEASUREMENT_VOLUMEMEASUREMENT大得多,它们每个包含1000万行以上

4 个答案:

答案 0 :(得分:6)

为什么一个查询有效,而另一个答案未说明另一个查询。所以我只添加写查询的方式:

您需要measurement_areameasurement_volume的完全外部联接。在子查询中执行此操作,然后与measurement表联接:

select id, m.meas_name, data.name, data.area, data.volume
from measurement m
join 
(
  select id, name, ma.area, mv.volume
  from measurement_area ma
  full outer join measurement_volume mv using (id, name)
) data using(id);

答案 1 :(得分:4)

为什么查询1不起作用?

...
ON meas.ID = vol.ID AND area.NAME = vol.name
...
where meas.ID = 1000

您的完全连接条件具有area.name = vol.name,这意味着MEAS_VOLUME表中名称为“ Point 3”的行不匹配。仅通过联接,您确实会从该表中获取行,但是由于它不符合条件,因此只有该表中的字段才具有值-meas.ID与MEAS_NAME和AREA一起为null。但是,然后您筛选出ID不等于1000的行。如果删除该查询的where子句,则会得到:

ID      MEAS_NAME       NAME    AREA    VOLUME
1000    Measurement 1   Point 1 10      100
                        Point 3         200
1000    Measurement 1   Point 2 20  

查询2为何起作用?

基本上是因为它可以正确回答问题。似乎您在那个区域中已经认识到,area.ID和vol.ID并不总是同时可用,因此您要在联接中将MEASUREMENT匹配到其中一个,这意味着您的查询有效。

实现输出的最有效方法是什么?

没有更多信息,这很难回答-您的执行计划是什么样的?有哪些索引可用?正在使用什么?

我猜想首先要进行完全联接,所以您要对2个大表执行此操作,然后再联接回第一个表。更新表上的统计信息可能会解决查询2的性能问题,或者可能需要更深入的分析。

已编辑为添加-这是查询的另一种正确版本,其执行速度可能比查询2快。从连接条件中获取OR,有时会使优化程序难以生存。

with MEASUREMENT as
(
  select 1000 as ID, 'Measurement 1' as MEAS_NAME from dual
), MEASUREMENT_AREA as
(
   select 1000 as ID, 'Point 1' as NAME, 10 as AREA from dual union all
   select 1000 as ID, 'Point 2' as NAME, 20 as AREA from dual
), MEASUREMENT_VOLUME as
(
   select 1000 as ID, 'Point 1' as NAME, 100 as VOLUME from dual union all
   select 1000 as ID, 'Point 3' as NAME, 200 as VOLUME from dual
),
base_qry as (
    select meas.ID, meas_name, area.name, area, null as volume
    FROM MEASUREMENT meas
      LEFT JOIN MEASUREMENT_AREA area
        ON meas.ID = area.ID
    WHERE meas.ID = 1000

    union all 

    select meas.ID, meas_name, vol.name, null, volume
    FROM MEASUREMENT meas
      LEFT JOIN MEASUREMENT_VOLUME vol
        ON meas.ID = vol.ID
    WHERE meas.ID = 1000)
select ID, MEAS_NAME, NAME,
    max(AREA) as AREA,
    max(VOLUME) as VOLUME
from base_qry
group by ID, MEAS_NAME, NAME
order by 1,2,3
;

答案 2 :(得分:0)

我基本上结合了@dandarc和@ thorsten-kettner的答案(非常感谢您的宝贵意见):

由于MEASUREMENT_VOLUMEMEASUREMENT_AREAMEASUREMENT大得多,因此我拆分了JOIN:

SELECT *
FROM 
(
  SELECT *
  FROM MEASUREMENT
  JOIN MEASUREMENT_AREA
    USING(ID)
  WHERE ID = 1000
)
FULL JOIN
(
  SELECT *
  FROM MEASUREMENT
  JOIN MEASUREMENT_VOLUME
    USING(ID)
  WHERE ID = 1000
) USING (ID, MEAS_NAME, NAME);

出于我的目的,重要的是首先将大表连接到MEASUREMENT,然后将那些结果合并(如@dandarc所建议的,可以与UNION ALLGROUP BY一起使用)。

这有效地解决了我的问题。使用查询2,三个表上的FULL JOIN花费了3分钟以上的时间。使用此解决方案需要花费数秒钟。

请注意,我的现实生活问题更加复杂,因为我要选择数十列,而不能简单地使用SELECT *。因此,我无法使用USING(ID, MEAS_NAME, NAME),但需要坚持使用ON语法。

答案 3 :(得分:-1)

尝试一下-

SELECT meas.ID AS "ID",
meas.MEAS_NAME AS "MEAS_NAME",
COALESCE (area.NAME, vol.NAME) as "NAME",
area.AREA, vol.VOLUME
FROM MEASUREMENT meas
LEFT JOIN MEASUREMENT_AREA area
ON meas.ID = area.ID
LEFT JOIN MEASUREMENT_VOLUME vol
ON meas.ID = vol.ID
WHERE meas.ID = 1000;

只需从您的第一个查询中删除area.NAME = vol.NAME