如何在hive中获取表的差异

时间:2017-03-12 13:26:16

标签: sql hive

我有两个表,AB,我只想获取A中的所有条目,但不是B中的所有条目,并且两个表都被{dt分区{1}},所以我做了以下事情:

1) select A.* from A left join B on A.key=B.key where B.key is null and A.dt=20170101 and B.dt=20170101  -- wrong result

2) select A.* from A left join B on (A.key=B.key and A.dt=20170101 and B.dt=20170101)  -- wrong result

3) select A1.* from (select * from A where dt=20170101) A1 left join (select * from B where dt=2017101) B1 on A1.key=B1.key  -- correct result

为什么1)和2)不起作用?我很困惑......

1 个答案:

答案 0 :(得分:1)

  

1)select A.* from A left join B on A.key=B.key where B.key is null and A.dt=20170101 and B.dt=20170101 -- wrong result

如果where B.key is null and B.dt=20170101

A.key=B.key是互斥的。这基本上将您的查询转换为:

select A.*
from A
  inner join B 
    on 1=0
  

2)select A.* from A left join B on (A.key=B.key and A.dt=20170101 and B.dt=20170101) - 错误的结果`

A.dt=20170101仅适用于连接条件,而不是结果。这意味着您将获得dt的所有A

  

3)select A1.* from (select * from A where dt=20170101) A1 left join (select * from B where dt=2017101) B1 on A1.key=B1.key - 正确的结果

这些会给你相同的结果:

select a.*
from A
  left join B1
    on A.Key = B.Key
   and B.dt = 20170101
where A.dt = 20170101

select a.*
from A
  left join B
    on A.Key = B.Key
   and A.dt = B.dt
where A.dt = 20170101

这是一个sql server演示,但它可能有助于说明:http://rextester.com/JCZENB83359