SQL将日期缩小到“开始 - 结束”范围

时间:2016-07-13 18:17:22

标签: sql date range impala nosql

我在包含多个日期的表格中重复了一行:

 ID      STATE       DATE
----------------------------
id01   connected  2015-04-04
id01   connected  2015-04-05
id01   connected  2015-04-08
id01   disconect  2015-04-11
id01   disconect  2015-04-12
id01   connected  2015-04-13

我想要一个带有“开始日期”和“结束日期”的查询,结果如下:

 ID      STATE    START DATE   END DATE
----------------------------------------
id01   connected  2015-04-04  2015-04-10
id01   disconect  2015-04-11  2015-04-12
id01   connected  2015-04-13  XXXXXXXXXX

最后一个“结束日期”并不重要(最后一个值,null,now()...)

最重要的是检测更改日期(在此示例中,2015-04-10没有行,2015-04-13发生相同的状态)。

可行的解决方案? (无效)

SELECT ID, STATE, MIN(date), MAX(date) 
   FROM TABLE
   GROUP BY ID, STATE;

无效,因为合并间隔:

 ID      STATE    START DATE   END DATE
----------------------------------------
id01   connected  2015-04-04  XXXXXXXXXX
id01   disconect  2015-04-11  2015-04-12

查询已在Impala(类似SQL92)中运行

2 个答案:

答案 0 :(得分:1)

Impala支持窗口功能。这个问题是"缺口和岛屿"问题,所以可以使用行号的差异来解决:

select id, state, min(date) as start_date, max(date) as end_date
from (select t.*,
             row_number() over (partition by id order by date) as seqnum_id,
             row_number() over (partition by id, state order by date) as seqnum_isd
      from table t
     ) t
group by id, state, (seqnum_id - seqnum_isd);

差异的逻辑并不困难,但是当你第一次学习它时却很棘手。它有助于运行子查询并查看行号值是什么 - 以及差异定义每个组的原因。

答案 1 :(得分:0)

(代表OP发布)

Gordon Linoff's answer,将“差距和岛屿”问题转化为我的研究案例,有解决方案:

select 
    id,
    state,
    start_date,
    date_add(lag(start_date, 1) over (partition by id order by start_date desc), -1) as end_date
from 
    (select id, state, min(date) as start_date, max(date) as end_date
        from (select t.*,
                row_number() over (partition by id order by date) as seqnum_id,
                row_number() over (partition by id, state order by date) as seqnum_isd
            from test t
        ) t
    group by id, state, (seqnum_id - seqnum_isd)) t_range
order by start_date;