我正在对同一列的4个表进行完全外部联接。 我只想为“连接”列中的每个不同值生成一行。
输入为:
employee1
+---------------------+-----------------+--+
| employee1.personid | employee1.name |
+---------------------+-----------------+--+
| 111 | aaa |
| 222 | bbb |
| 333 | ccc |
+---------------------+-----------------+--+
employee2
+---------------------+----------------+--+
| employee2.personid | employee2.sal |
+---------------------+----------------+--+
| 111 | 2 |
| 200 | 3 |
+---------------------+----------------+--+
employee3
+---------------------+------------------+--+
| employee3.personid | employee3.place |
+---------------------+------------------+--+
| 111 | bbsr |
| 300 | atl |
| 200 | ny |
+---------------------+------------------+--+
employee4
+---------------------+---------------+--+
| employee4.personid | employee4.dt |
+---------------------+---------------+--+
| 111 | 2019-02-21 |
| 300 | 2019-03-18 |
| 400 | 2019-03-18 |
+---------------------+---------------+--+
预期结果 每个人名一张记录,因此总共应该有6条记录(111,222,333,200,300,400) 喜欢:
+-----------+---------+--------+----------+-------------+--+
| personid | f.name | u.sal | v.place | v_in.dt |
+-----------+---------+--------+----------+-------------+--+
| 111 | aaa | 2 | bbsr | 2019-02-21 |
| 200 | NULL | 3 | ny | NULL |
| 222 | bbb | NULL | NULL | NULL |
| 300 | NULL | NULL | atl | 2019-03-18 |
| 333 | ccc | NULL | NULL | NULL |
| 400 | NULL | NULL | NULL | 2019-03-18 |
+-----------+---------+--------+----------+-------------+--+
我得到的结果是:
+-----------+---------+--------+----------+-------------+--+
| personid | f.name | u.sal | v.place | v_in.dt |
+-----------+---------+--------+----------+-------------+--+
| 111 | aaa | 2 | bbsr | 2019-02-21 |
| 200 | NULL | 3 | NULL | NULL |
| 200 | NULL | NULL | ny | NULL |
| 222 | bbb | NULL | NULL | NULL |
| 300 | NULL | NULL | atl | NULL |
| 300 | NULL | NULL | NULL | 2019-03-18 |
| 333 | ccc | NULL | NULL | NULL |
| 400 | NULL | NULL | NULL | 2019-03-18 |
+-----------+---------+--------+----------+-------------+--+
使用的查询:
select coalesce(f.personid, u.personid, v.personid, v_in.personid) as personid,f.name,u.sal,v.place,v_in.dt
from employee1 f FULL OUTER JOIN employee2 u on f.personid=u.personid
FULL OUTER JOIN employee3 v on f.personid=v.personid
FULL OUTER JOIN employee4 v_in on f.personid=v_in.personid;
请建议如何产生预期结果。
答案 0 :(得分:0)
full outer join
很棘手,因为您必须考虑以前的NULL
。但您可以这样做:
select coalesce(f.personid, u.personid, v.personid, v_in.personid) as personid,f.name,u.sal,v.place,v_in.dt
from employee1 f FULL OUTER JOIN
employee2 u
on f.personid = u.personid FULL OUTER JOIN
employee3 v
on v.personid in (f.person_id, u.person_id) FULL OUTER JOIN
employee4 v_in
on v_in.personid in (f.person_id, u.person_id, v.person_id);
在using
(而不是join
)支持on
的数据库中,这更简单。不过,我认为Hive不支持using
。
答案 1 :(得分:0)
FULL JOIN返回所有已连接的行+所有未从左侧表连接的行+全部未从右侧表连接的行。而且,由于您要将<HoverCard>
,employee2
,employee3
连接到不包含employee4
的同一employee1
表中,因此从所有四个表返回的所有未连接的行
我建议对所有四个表都进行UNION,为缺少的字段提供NULL +通过personid=200
进行聚合分组:
personid
这将比联接更好。