根据时间戳之间的间隔对时间戳进行分组

时间:2015-01-09 10:24:41

标签: sql session select group-by hive

我在Hive(SQL)中有一个表,需要对一组时间戳进行分组,以便根据时间戳之间的时间差创建单独的会话。

实施例: 考虑以下时间戳(为了简单起见,在HH:MM中给出): 9.00 9.10 9.20 9.40 9.43 10.30 10.45 11.25 12.30 12.33 等等..

现在,所有落在下一个时间戳30分钟内的时间戳都在同一个会话中, 即9.00,9.10,9.20,9.40,9.43表格1会议。

但由于9.43和10.30之间的差异超过30分钟,时间戳10.30属于不同的会话。同样,10.30和10.45属于一个会议。

在我们创建了这些会话之后,我们必须获得该会话的最小时间戳和最大时间戳。

我试图用LEAD减去当前时间戳,如果超过30分钟则放置一个标志,但我对此有困难。

你们的任何建议都将不胜感激。如果问题不够明确,请告诉我。

此样本数据的预期输出:

Session_start   Session_end
9.00                9.43
10.30               10.45
11.25               11.25 (same because the next time is not within 30 mins)
12.30               12.33

希望这有帮助。

4 个答案:

答案 0 :(得分:7)

所以它不是MySQL,而是Hive。我不知道Hive,但是如果它支持LAG,就像你说的那样,试试这个PostgreSQL查询。您可能需要更改时差计算,这通常不同于一个dbms。

select min(thetime) as start_time, max(thetime) as end_time
from
(
  select thetime, count(gap) over (rows between unbounded preceding and current row) as groupid
  from
  (
    select thetime, case when thetime - lag(thetime) over (order by thetime) > interval '30 minutes' then 1 end as gap
    from mytable
  ) times
) groups
group by groupid
order by min(thetime);

查询找到间隙,然后使用运行总数的间隙计数来构建组ID,其余的是聚合。

SQL小提琴:http://www.sqlfiddle.com/#!17/8bc4a/6

答案 1 :(得分:2)

试试这个..

SELECT MIN(session_time_tmp) session_start, MAX(session_time_tmp) session_end FROM 
(
SELECT  IF((TIME_TO_SEC(TIMEDIFF(your_time_field, COALESCE(@previousValue, your_time_field))) / 60) > 30 , 
        @sessionCount := @sessionCount + 1, @sessionCount ) sessCount, 
        ( @previousValue := your_time_field ) session_time_tmp  FROM 
(
SELECT your_time_field, @previousValue:= NULL, @sessionCount := 1 FROM yourtable ORDER BY your_time_field
) a
) b
GROUP BY sessCount

只需替换 yourtable your_time_field

答案 2 :(得分:2)

由于MySQL缺少LAG和LEAD功能,获取上一个或下一个记录已经有了一些工作。方法如下:

select 
  thetime, 
  (select max(thetime) from mytable afore where afore.thetime < mytable.thetime) as afore_time,
  (select min(thetime) from mytable after where after.thetime > mytable.thetime) as after_time
from mytable;

基于此,我们可以构建整个查询,我们正在寻找间隙(即与前一个或下一个记录的时差超过30分钟= 1800秒)。

select
  startrec.thetime as start_time,
  (
    select min(endrec.thetime) 
    from 
    (
      select 
        thetime, 
        coalesce(time_to_sec(timediff((select min(thetime) from mytable after where after.thetime > mytable.thetime), thetime)), 1801) > 1800 as gap
      from mytable
    ) endrec
    where gap
    and endrec.thetime >= startrec.thetime
  ) as end_time
from
(
  select 
    thetime, 
    coalesce(time_to_sec(timediff(thetime, (select max(thetime) from mytable afore where afore.thetime < mytable.thetime))), 1801) > 1800 as gap
  from mytable
) startrec
where gap;

SQL小提琴:http://www.sqlfiddle.com/#!2/d307b/20

答案 3 :(得分:1)

试试这个:

SELECT DATE_FORMAT(MIN(STR_TO_DATE(B.column1, '%H.%i')), '%H.%i') AS Session_start, 
       DATE_FORMAT(MAX(STR_TO_DATE(B.column1, '%H.%i')), '%H.%i') AS Session_end
FROM tableA A
LEFT JOIN ( SELECT A.column1, diff, IF(@diff:=diff < 30, @id, @id:=@id+1) AS rnk
            FROM (SELECT B.column1, TIME_TO_SEC(TIMEDIFF(STR_TO_DATE(B.column1, '%H.%i'), STR_TO_DATE(A.column1, '%H.%i'))) / 60 AS diff
                  FROM tableA A
                  INNER JOIN tableA B ON STR_TO_DATE(A.column1, '%H.%i') < STR_TO_DATE(B.column1, '%H.%i') 
                  GROUP BY STR_TO_DATE(A.column1, '%H.%i')
                 ) AS A, (SELECT @diff:=0, @id:= 1) AS B
           ) AS B ON A.column1 = B.column1
GROUP BY IFNULL(B.rnk, 1);

检查SQL FIDDLE DEMO

<强>输出

| SESSION_START | SESSION_END |
|---------------|-------------|
|          9.00 |        9.43 |
|         10.30 |       10.45 |
|         11.25 |       11.25 |
|         12.30 |       12.33 |