链接'calculate the time length'解决了在子序列中计算时间长度的问题。
数据如下:
time(string) id(int)
201801051127 0
201801051130 0
201801051132 0
201801051135 1
201801051141 1
201801051145 0
201801051147 0
现在我有一些问题:
(1)第一个序列的时间长度应以' 201801051100'开头,并以下一个序列的开始时间结束,例如' 201801051135',所以时间长度为第一个序列是35;
(2)第二个序列的时间长度应以其开始时间开始,并以下一个序列的开始时间结束;
(3)最终序列的时间长度应以其开始时间开始,以' 201801051200'结束。
为了满足这三个计算规则作为第一个序列,中间序列和最终序列,如何使用hive来实现它基于'calculate the time length'中编写的代码:
with q1 as (
select unix_timestamp(time, 'yyyyMMddHHmm')/60 time, id,
case id when lag(id) over(order by time) then null else 1 end
first_in_group
from t
), q2 as (
select time, id, count(first_in_group) over (order by time) grp_id
from q1
)
select min(id) id, max(time) - min(time) minutes
from q2
group by grp_id
order by grp_id
答案 0 :(得分:0)
您可以通过对查询进行一些小修改来实现这一目标:
with q1 as (
select unix_timestamp(time, 'yyyyMMddHHmm')/60 time, id,
case id when lag(id) over(order by time) then null else 1 end
first_in_group
from t
), q2 as (
select time, id
from q1
where first_in_group = 1
)
select id, lead(time, 1, unix_timestamp('201801051200', 'yyyyMMddHHmm')/60)
over (order by time) - time
as minutes
from q2