我在Hive(SQL)中有一个表,需要对一组时间戳进行分组,以便根据时间戳之间的时间差创建单独的会话。
实施例:
考虑以下时间戳(为了简单起见,在HH:MM中给出):
9.00
9.10
9.20
9.40
9.43
10.30
10.45
11.25
12.30
12.33
等等..
现在,所有落在下一个时间戳30分钟内的时间戳都在同一个会话中, 即9.00,9.10,9.20,9.40,9.43表格1会议。
但由于9.43和10.30之间的差异超过30分钟,时间戳10.30属于不同的会话。同样,10.30和10.45属于一个会议。
在我们创建了这些会话之后,我们必须获得该会话的最小时间戳和最大时间戳。
我试图用LEAD减去当前时间戳,如果超过30分钟则放置一个标志,但我对此有困难。
你们的任何建议都将不胜感激。如果问题不够明确,请告诉我。
此样本数据的预期输出:
Session_start Session_end
9.00 9.43
10.30 10.45
11.25 11.25 (same because the next time is not within 30 mins)
12.30 12.33
希望这有帮助。
答案 0 :(得分:7)
所以它不是MySQL,而是Hive。我不知道Hive,但是如果它支持LAG,就像你说的那样,试试这个PostgreSQL查询。您可能需要更改时差计算,这通常不同于一个dbms。
select min(thetime) as start_time, max(thetime) as end_time
from
(
select thetime, count(gap) over (rows between unbounded preceding and current row) as groupid
from
(
select thetime, case when thetime - lag(thetime) over (order by thetime) > interval '30 minutes' then 1 end as gap
from mytable
) times
) groups
group by groupid
order by min(thetime);
查询找到间隙,然后使用运行总数的间隙计数来构建组ID,其余的是聚合。
答案 1 :(得分:2)
试试这个..
SELECT MIN(session_time_tmp) session_start, MAX(session_time_tmp) session_end FROM
(
SELECT IF((TIME_TO_SEC(TIMEDIFF(your_time_field, COALESCE(@previousValue, your_time_field))) / 60) > 30 ,
@sessionCount := @sessionCount + 1, @sessionCount ) sessCount,
( @previousValue := your_time_field ) session_time_tmp FROM
(
SELECT your_time_field, @previousValue:= NULL, @sessionCount := 1 FROM yourtable ORDER BY your_time_field
) a
) b
GROUP BY sessCount
只需替换 yourtable 和 your_time_field
答案 2 :(得分:2)
由于MySQL缺少LAG和LEAD功能,获取上一个或下一个记录已经有了一些工作。方法如下:
select
thetime,
(select max(thetime) from mytable afore where afore.thetime < mytable.thetime) as afore_time,
(select min(thetime) from mytable after where after.thetime > mytable.thetime) as after_time
from mytable;
基于此,我们可以构建整个查询,我们正在寻找间隙(即与前一个或下一个记录的时差超过30分钟= 1800秒)。
select
startrec.thetime as start_time,
(
select min(endrec.thetime)
from
(
select
thetime,
coalesce(time_to_sec(timediff((select min(thetime) from mytable after where after.thetime > mytable.thetime), thetime)), 1801) > 1800 as gap
from mytable
) endrec
where gap
and endrec.thetime >= startrec.thetime
) as end_time
from
(
select
thetime,
coalesce(time_to_sec(timediff(thetime, (select max(thetime) from mytable afore where afore.thetime < mytable.thetime))), 1801) > 1800 as gap
from mytable
) startrec
where gap;
答案 3 :(得分:1)
试试这个:
SELECT DATE_FORMAT(MIN(STR_TO_DATE(B.column1, '%H.%i')), '%H.%i') AS Session_start,
DATE_FORMAT(MAX(STR_TO_DATE(B.column1, '%H.%i')), '%H.%i') AS Session_end
FROM tableA A
LEFT JOIN ( SELECT A.column1, diff, IF(@diff:=diff < 30, @id, @id:=@id+1) AS rnk
FROM (SELECT B.column1, TIME_TO_SEC(TIMEDIFF(STR_TO_DATE(B.column1, '%H.%i'), STR_TO_DATE(A.column1, '%H.%i'))) / 60 AS diff
FROM tableA A
INNER JOIN tableA B ON STR_TO_DATE(A.column1, '%H.%i') < STR_TO_DATE(B.column1, '%H.%i')
GROUP BY STR_TO_DATE(A.column1, '%H.%i')
) AS A, (SELECT @diff:=0, @id:= 1) AS B
) AS B ON A.column1 = B.column1
GROUP BY IFNULL(B.rnk, 1);
<强>输出强>
| SESSION_START | SESSION_END |
|---------------|-------------|
| 9.00 | 9.43 |
| 10.30 | 10.45 |
| 11.25 | 11.25 |
| 12.30 | 12.33 |