HIVE SQL将连续范围折叠为单行

时间:2018-05-10 20:26:08

标签: mysql sql hadoop hive hiveql

考虑表中的以下记录:

NAME    ID      RATE   LOC   DAY
ABCD    123      -5    NYC    2017-01-01
ABCD    123      -5    NYC    2017-01-02
ABCD    123      -6    SFO    2017-01-03
ABCD    123      -6    DEN    2017-01-04
ABCD    345      -4    ATL    2017-01-05
ABCD    345      -4    WAS    2017-01-06
ABCD    123      -7    CLT    2017-01-07
ABCD    123      -7    CLT    2017-01-08

我希望输出如下:

NAME    ID      RATE  LOC   START DAY   END DAY
ABCD    123      -5   NYC   2017-01-01  2017-01-02
ABCD    123      -6   SFO   2017-01-03  2017-01-03
ABCD    123      -6   DEN   2017-01-04  2017-01-04
ABCD    345      -4   ATL   2017-01-05  2017-01-05
ABCD    345      -4   WAS   2017-01-06  2017-01-06
ABCD    123      -7   CLT   2017-01-07  2017-01-08

如何在SQL或HIVE中执行此操作?我尝试使用max over partition和row_number。它似乎不起作用。非常感谢任何想法。

这是我试过的SQL:

select *     
  from (
        select name
              ,id
              ,min(day) over (partition by name
                                          ,id) as start_date
              ,max(day) over (partition by name
                                          ,id) as end_date                     
              ,row_number () over (partition by name
                                               ,id
                                 order by day asc) as row1
          from table
       ) a
where row1=1;

1 个答案:

答案 0 :(得分:1)

这可以通过行数方法的不同来实现。要了解它的工作原理,请运行内部查询并查看结果。

select name,id,min(day),max(day)
from (select t.*,
      row_number() over(order by day)
      -row_number() over(partition by name,id order by day) as grp
      from tbl t
     ) t
group by name,id,grp