Question

链接'calculate the time length'解决了在子序列中计算时间长度的问题。

数据如下：

time(string) id(int)

201801051127 0

201801051130 0

201801051132 0

201801051135 1

201801051141 1

201801051145 0

201801051147 0

现在我有一些问题：

（1）第一个序列的时间长度应以＆＃39; 201801051100＆＃39;开头，并以下一个序列的开始时间结束，例如＆＃39; 201801051135＆＃39;，所以时间长度为第一个序列是35;

（2）第二个序列的时间长度应以其开始时间开始，并以下一个序列的开始时间结束;

（3）最终序列的时间长度应以其开始时间开始，以＆＃39; 201801051200＆＃39;结束。

为了满足这三个计算规则作为第一个序列，中间序列和最终序列，如何使用hive来实现它基于'calculate the time length'中编写的代码：

with q1 as (
select unix_timestamp(time, 'yyyyMMddHHmm')/60 time, id, 
       case id when lag(id) over(order by time) then null else 1 end 
first_in_group 
from t
), q2 as (
select time, id, count(first_in_group) over (order by time) grp_id
from   q1
)
select   min(id) id, max(time) - min(time) minutes
from     q2
group by grp_id
order by grp_id

Answer 1

您可以通过对查询进行一些小修改来实现这一目标：

with q1 as (
    select unix_timestamp(time, 'yyyyMMddHHmm')/60 time, id, 
           case id when lag(id) over(order by time) then null else 1 end 
                first_in_group 
    from t
), q2 as (
    select time, id
    from   q1
    where  first_in_group = 1
)
select   id, lead(time, 1, unix_timestamp('201801051200', 'yyyyMMddHHmm')/60) 
             over (order by time) - time 
             as minutes
from     q2

用hive计算0-1序列的时间长度

1 个答案: