我如何创建逻辑以将具有连续日期范围的多个记录合并为一行
以下示例数据
Member_key start_date end_date
1 1/1/2017 1/31/2017
1 2/1/2017 2/28/2017
1 3/1/2017 3/31/2017
2 1/1/2017 1/31/2017
2 3/1/2017 3/31/2017
最终将返回以下结果集
1 1/1/2017 3/31/2017
2 1/1/2017 1/31/2017
2 3/1/2017 3/31/2017
我发现以下链接非常有帮助,我确定我的方向正确,但是在尝试将代码转换为Hive sql时遇到错误
http://betteratoracle.com/posts/35-collapsing-continuous-ranges-into-single-rows
这是我被卡住的地方(下面的第二行至最后一行-在.....
中按我的max(grp)中的顺序排列)with data as(
select
member_key,
case
when datediff(start_date, lag(end_date) over (partition by member_key order by start_date asc)) <= 1 then
null
else
row_number() over ()
end grp,
start_date,
end_date
from default.eligibility_span_test
order by member_key, start_date
)
select member_key, start_date, end_date
, max(grp) over (order by member_key, start_date) sequence
from data
这是我用来向测试表添加数据的插入语句:
insert into default.eligibility_span_test values (1, '2017-01-01','2017-01-31');
insert into default.eligibility_span_test values (1, '2017-02-01', '2017-02-28');
insert into default.eligibility_span_test values (1, '2017-03-01', '2017-03-31');
insert into default.eligibility_span_test values (2, '2017-01-01', '2017-01-31');
insert into default.eligibility_span_test values (2, '2017-03-01', '2017-03-31');
答案 0 :(得分:0)
您可以尝试以下查询吗?
with eligibility_span_test as
(
select 1 as Member_key, from_unixtime(unix_timestamp('2017-01-01', 'yyyy-MM-dd'), 'yyyy-MM-dd') as start_date, from_unixtime(unix_timestamp('2017-01-31', 'yyyy-MM-dd'), 'yyyy-MM-dd') end_date
union
select 1 as Member_key, from_unixtime(unix_timestamp('2017-02-01', 'yyyy-MM-dd'), 'yyyy-MM-dd') as start_date, from_unixtime(unix_timestamp('2017-02-28', 'yyyy-MM-dd'), 'yyyy-MM-dd') end_date
union
select 1 as Member_key, from_unixtime(unix_timestamp('2017-03-01', 'yyyy-MM-dd'), 'yyyy-MM-dd') as start_date, from_unixtime(unix_timestamp('2017-03-31', 'yyyy-MM-dd'), 'yyyy-MM-dd') end_date
union
select 2 as Member_key, from_unixtime(unix_timestamp('2017-01-01', 'yyyy-MM-dd'), 'yyyy-MM-dd') as start_date, from_unixtime(unix_timestamp('2017-01-31', 'yyyy-MM-dd'), 'yyyy-MM-dd') end_date
union
select 2 as Member_key, from_unixtime(unix_timestamp('2017-03-01', 'yyyy-MM-dd'), 'yyyy-MM-dd') as start_date, from_unixtime(unix_timestamp('2017-03-31', 'yyyy-MM-dd'), 'yyyy-MM-dd') end_date
),
res as (select member_key, month(start_date) - row_number() over (partition by member_key order by start_date) as groupBy, start_date, end_date from eligibility_span_test)
select member_key, min(start_date), min(end_date) from res group by groupBy, member_key;
上面的查询将获取那些我们没有连续的开始和结束日期的memberId,如果我们有连续的日期则获取一个memberId