现在我有一个数据:
time(string) id(int)
201801051127 0
201801051130 0
201801051132 0
201801051135 1
201801051141 1
201801051145 0
201801051147 0
它有三个不同的部分,我想计算这三个部分的时间长度,例如第一个零序列,时间长度是5分钟。如果我使用'group by 0和1',第一个零序列将与第三个零序列组合,这不是我想要的。我如何用sql计算这三个部分的长度?我尝试过的my-sql代码如下:
SET @id_label:=0;
SELECT id_label,id,TIMESTAMPDIFF(MINUTE,MIN(DATE1),MAX(DATE1)) FROM
(SELECT id, DATE1, id_label FROM (
SELECT id, str_to_date ( TIME,'%Y%m%d%H%i' ) DATE1,
@id_label := IF(@id = id, @id_label, @id_label+1) id_label,
@id := id
FROM test.t
ORDER BY str_to_date ( TIME,'%Y%m%d%h%i' )
) a)b
GROUP BY id_label,id;
我不知道如何将其更改为hive代码。
答案 0 :(得分:1)
试试这个。
SELECT id, ( max( TO_DATE ( time,'YYYYMMDDHHMI' ) )
- min( TO_DATE ( time,'YYYYMMDDHHMI' ) ) ) *24*60 diff_in_minutes from
(
select t.*,
row_number() OVER ( ORDER BY
TO_DATE ( time,'YYYYMMDDHHMI' ) )
- row_number() OVER ( PARTITION BY ID ORDER BY
TO_DATE ( time,'YYYYMMDDHHMI' ) ) seq
FROM Table1 t ORDER BY time
) GROUP BY ID,seq
ORDER BY max(time)
;
编辑:考虑到OP标记了oracle
,我们写了这个答案。现在它已更改为hive
。
作为Oracle中TO_DATE
的配置单元的替代方案,
unix_timestamp(time, 'yyyyMMddhhmm')
可以使用。
答案 1 :(得分:1)
我建议进行一些转换:
然后你可以按新的组号分组。
with q1 as (
select to_date(time, 'YYYYMMDDHH24MI') time, id,
case id when lag(id) over(order by time) then null else 1 end first_in_group
from t
), q2 as (
select time, id, count(first_in_group) over (order by time) grp_id
from q1
)
select min(id) id, (max(time) - min(time)) * 24 * 60 minutes
from q2
group by grp_id
order by grp_id
不同的数据库引擎使用不同的函数来处理日期/时间值,因此使用Hive的unix_timestamp
并处理它返回的秒数:
with q1 as (
select unix_timestamp(time, 'yyyyMMddHHmm')/60 time, id,
case id when lag(id) over(order by time) then null else 1 end first_in_group
from t
), q2 as (
select time, id, count(first_in_group) over (order by time) grp_id
from q1
)
select min(id) id, max(time) - min(time) minutes
from q2
group by grp_id
order by grp_id