考虑表中的以下记录:
NAME ID RATE LOC DAY
ABCD 123 -5 NYC 2017-01-01
ABCD 123 -5 NYC 2017-01-02
ABCD 123 -6 SFO 2017-01-03
ABCD 123 -6 DEN 2017-01-04
ABCD 345 -4 ATL 2017-01-05
ABCD 345 -4 WAS 2017-01-06
ABCD 123 -7 CLT 2017-01-07
ABCD 123 -7 CLT 2017-01-08
我希望输出如下:
NAME ID RATE LOC START DAY END DAY
ABCD 123 -5 NYC 2017-01-01 2017-01-02
ABCD 123 -6 SFO 2017-01-03 2017-01-03
ABCD 123 -6 DEN 2017-01-04 2017-01-04
ABCD 345 -4 ATL 2017-01-05 2017-01-05
ABCD 345 -4 WAS 2017-01-06 2017-01-06
ABCD 123 -7 CLT 2017-01-07 2017-01-08
如何在SQL或HIVE中执行此操作?我尝试使用max over partition和row_number。它似乎不起作用。非常感谢任何想法。
这是我试过的SQL:
select *
from (
select name
,id
,min(day) over (partition by name
,id) as start_date
,max(day) over (partition by name
,id) as end_date
,row_number () over (partition by name
,id
order by day asc) as row1
from table
) a
where row1=1;
答案 0 :(得分:1)
这可以通过行数方法的不同来实现。要了解它的工作原理,请运行内部查询并查看结果。
select name,id,min(day),max(day)
from (select t.*,
row_number() over(order by day)
-row_number() over(partition by name,id order by day) as grp
from tbl t
) t
group by name,id,grp