在Amazon Redshift中按时差分组

时间:2018-11-13 22:05:24

标签: sql amazon-redshift

我正在使用以下查询:

SELECT a.session_id,
         a.created_at,
         COUNT(DISTINCT a.mongo_id) AS events
  FROM table1 a
    JOIN table1 b ON a.session_id = b.session_id
  GROUP BY a.session_id,
           a.created_at
  ORDER BY a.session_id,
           a.created_at,
           COUNT(DISTINCT a.mongo_id) DESC

获得以下结果:

Session1    2018-10-09 14:04:31.0   22
Session1    2018-10-09 14:04:32.0   10
Session1    2018-10-09 14:04:34.0   1
Session1    2018-10-09 14:04:38.0   1
Session1    2018-10-09 14:04:41.0   1
Session1    2018-10-09 14:04:42.0   1
Session1    2018-10-09 14:04:43.0   2
Session1    2018-10-09 14:04:44.0   2
Session1    2018-10-09 14:04:45.0   1
Session1    2018-10-09 14:04:46.0   2
Session1    2018-10-09 14:04:47.0   2
Session1    2018-10-09 14:04:50.0   2
Session1    2018-10-09 14:04:51.0   2
Session1    2018-10-09 14:04:52.0   1
Session1    2018-10-09 14:04:53.0   1
Session1    2018-10-09 14:04:55.0   1
Session1    2018-10-09 14:04:56.0   1
Session1    2018-10-09 14:04:57.0   1
Session1    2018-10-09 14:05:00.0   1
Session1    2018-10-09 14:05:01.0   2
Session1    2018-10-09 14:05:03.0   3
Session1    2018-10-09 14:05:06.0   1
Session1    2018-10-09 14:05:07.0   2
Session1    2018-10-09 14:05:09.0   4
Session1    2018-10-09 14:05:10.0   30

我想对3秒内发生的所有事件进行分组,以得到以下结果:

Session1    2018-10-09 14:04:31.0   33
Session1    2018-10-09 14:04:38.0   2
Session1    2018-10-09 14:04:42.0   6
Session1    2018-10-09 14:04:46.0   4
Session1    2018-10-09 14:04:50.0   6
Session1    2018-10-09 14:04:55.0   3
Session1    2018-10-09 14:05:00.0   6
Session1    2018-10-09 14:05:06.0   7
Session1    2018-10-09 14:05:10.0   30

我想对3秒钟内的所有事件求和,以得到结果列,如上所示。

为了实现这一目标,我使用了以下查询:

WITH t AS
(
  SELECT a.session_id,
         a.created_at,
         COUNT(DISTINCT a.mongo_id) AS events
  FROM table1 a
    JOIN table1 b ON a.session_id = b.session_id
  GROUP BY a.session_id,
           a.created_at
  ORDER BY a.session_id,
           a.created_at,
           COUNT(DISTINCT a.mongo_id) DESC
)
SELECT a.session_id,
       TIMESTAMP WITH TIME ZONE 'epoch' +INTERVAL '1 second' *ROUND(EXTRACT('epoch' FROM a.created_at) / 3)*3 AS TIMESTAMP,
       SUM(b.events)
FROM t AS a
  JOIN t AS b ON a.session_id = b.session_id
GROUP BY a.session_id,
         ROUND(EXTRACT('epoch' FROM a.created_at) / 3)
ORDER BY a.session_id,
         TIMESTAMP

但这给了我错误的数字。

我该如何实现?任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:0)

让我假设您以某种方式获得了指定的结果。然后,您可以使用窗口功能:

with results as (
      <whatever>
     )
select sessionid, min(created_at), max(created_at), sum(events)
from (select r.*,
             sum( (prev_ca < created_at - interval '3 second')::int ) over (partition by sessionid order by created_at rows between unbounded preceding and current row) as grp
      from (select r.*,
                   lag(created_at) over (partition by sessionid order by created_at) as prev_ca
            from results r
           ) r
     ) r
group by sessionid, grp;

这是通过查看上一个created_at并确定是否早于3秒来确定组从哪里开始。如果是这样,则开始一个小组。

组开始的累积总和是一个分组标识符,可用于聚合。