关于Hive中左外部联接的非等联接的查询

时间:2020-06-10 14:49:09

标签: sql join hive hql non-equi-join

我正在尝试使用非等联接条件的两个表之间的左外部联接,并且配置单元不支持它。在where子句中添加条件会导致数据丢失。请让我知道是否有解决方案。以下是示例代码片段

Select B.dt ,D.field, sum(B.qty)
from A INNER join B ON A.dt= B.dt
INNER Join C ON B.nbr=C.nbr
LEFT OUTER JOIN D ON A.nbr2=D.Nbr2
AND B.nbr=D.nbr
---Below non equi join not supported
AND B.dt between C.start_date and C.End_Date 
-- Need suggestion of this non equi join.

以下是配置单元中非等联接的错误:FAILED:SemanticException [错误10017]:第9:4行在JOIN'START_DATE'中同时遇到左右别名

1 个答案:

答案 0 :(得分:0)

您可以采用一种方法来执行此操作。这是union all /窗口函数方法。我认为这就是您想要的:

with t as (
      select a.nbr2, b.nbr, b.dt, null as end_date, null as field, b.qty
      from A join
           B
           on A.dt = B.dt
      union all
      select d.nrb2, d.nbr, d.start_date, d.end_date, d.field, null
      from D
    )
select dt, (case when dt < d_end_date then d_field end), sum(qty)
from (select t.*, 
             last_value(field, true) over (partition by nbr, nbr2 order by dt) as d_field,
             last_value(end_date, true) over (partition by nbr, nbr2 order by dt) as d_end_date
      from t
     ) t
group by dt, dt, (case when dt < d_end_date then d_field end);

我不是100%肯定这是完全一样的-例如,这假设D中最多有一个匹配的记录,并且没有重叠。但是我们的想法是对值进行交织,并将窗口函数用作last_value()的{​​{1}} s选项来获取正确的值。