蜂巢窗功能

时间:2019-03-19 14:07:04

标签: sql date hive

我具有以下格式的数据,我需要根据标志列值的更改生成flag_date列。

login_date      id      flag       flag_date   
5/1/2018        100     Y            NULL 
5/2/2018        100     Y            NULL
5/3/2018        100     N          5/3/2018
5/4/2018        100     N          5/3/2018
5/5/2018        100     Y          5/3/2018
5/6/2018        100     Y          5/3/2018
5/7/2018        100     N          5/7/2018
5/8/2018        100     Y          5/7/2018
5/9/2018        100     Y          5/7/2018
5/10/2018       100     N          5/10/2018

最初flag_date的值将为null,但是当flag从Y更改为N时,flag_date将被填充,并且该值将一直持续到下一个Y更改为N。请帮忙。

1 个答案:

答案 0 :(得分:0)

使用窗口函数,您的问题似乎很简单,但这很棘手。与先前记录有标志的依赖关系,连续Y / N时使用第一个值。

  1. 从t1开始,我们将采用Prior_flag和Prior_flag_dt,即fg_dt
  2. 从t2开始,我们正在整理连续的N / Y fg_dt2
  3. 在t3中,我们再次回顾排序后的fg_dt2。现在,连续Y / N中的第一条记录将具有需要用于下一个Y / N的正确值
  4. 在最终查询中,您得到结果。

检查一下:

> create table hr02 ( login_date date, id int, flag string, flag_date date );

> insert into hr02 
select '2018-05-01', 100, 'Y', NULL
union all 
select '2018-05-02', 100, 'Y', NULL
union all
select '2018-05-03', 100, 'N', NULL
union all
select '2018-05-04', 100, 'N', NULL
union all
select '2018-05-05', 100, 'Y', NULL
union all
select '2018-05-06', 100, 'Y', NULL
union all
select '2018-05-07', 100, 'N', NULL
union all
select '2018-05-08', 100, 'Y', NULL
union all
select '2018-05-09', 100, 'Y', NULL
union all
select '2018-05-10', 100, 'N', NULL ;


> with t1 as ( select login_date, id, flag, lag(flag) over(order by login_date) as prior_flag, case when flag='Y' then lag(login_date) over(order by login_date)  else login_date end as fg_dt from hr02),
 t2 as ( select login_date, id, flag, prior_flag, fg_dt, case when flag='Y' then lag(fg_dt) over(order by login_date) when flag='N'  and prior_flag='N' then lag(fg_dt) over(order by login_date) else login_date end as fg_dt2 from t1 ),
 t3 as ( select login_date, id, flag, prior_flag, fg_dt, fg_dt2, case when flag='Y' and prior_flag='N' then lag(fg_dt2) over(order by login_date) when flag='N'  and prior_flag='N' then lag(fg_dt) over(order by login_date) else fg_dt2 end as fg_dt3  from t2)
 select login_date, id, flag, prior_flag, fg_dt, fg_dt2, fg_dt3, case when flag='Y' and prior_flag='Y' then lag(fg_dt3) over(order by login_date) else fg_dt3 end fg_dt4 from t3 ;


+-------------+------+-------+-------------+-------------+-------------+-------------+-------------+--+
| login_date  |  id  | flag  | prior_flag  |    fg_dt    |   fg_dt2    |   fg_dt3    |   fg_dt4    |
+-------------+------+-------+-------------+-------------+-------------+-------------+-------------+--+
| 2018-05-01  | 100  | Y     | NULL        | NULL        | NULL        | NULL        | NULL        |
| 2018-05-02  | 100  | Y     | Y           | 2018-05-01  | NULL        | NULL        | NULL        |
| 2018-05-03  | 100  | N     | Y           | 2018-05-03  | 2018-05-03  | 2018-05-03  | 2018-05-03  |
| 2018-05-04  | 100  | N     | N           | 2018-05-04  | 2018-05-03  | 2018-05-03  | 2018-05-03  |
| 2018-05-05  | 100  | Y     | N           | 2018-05-04  | 2018-05-04  | 2018-05-03  | 2018-05-03  |
| 2018-05-06  | 100  | Y     | Y           | 2018-05-05  | 2018-05-04  | 2018-05-04  | 2018-05-03  |
| 2018-05-07  | 100  | N     | Y           | 2018-05-07  | 2018-05-07  | 2018-05-07  | 2018-05-07  |
| 2018-05-08  | 100  | Y     | N           | 2018-05-07  | 2018-05-07  | 2018-05-07  | 2018-05-07  |
| 2018-05-09  | 100  | Y     | Y           | 2018-05-08  | 2018-05-07  | 2018-05-07  | 2018-05-07  |
| 2018-05-10  | 100  | N     | Y           | 2018-05-10  | 2018-05-10  | 2018-05-10  | 2018-05-10  |
+-------------+------+-------+-------------+-------------+-------------+-------------+-------------+--+