根据时间戳对记录进行排名,但省略时差小于30分钟的记录

时间:2015-08-11 14:20:15

标签: sql ranking amazon-redshift

我正在使用Amazon Redshift SQL查询,我尝试根据3列对记录进行排序:timestamp,cookieID,trackingpointID,但是当这3列所排序的两个记录之间的时差小于30分钟时(使用同样的cookieID和跟踪点),我想省略这些记录并留下最高记录,例如,如果我有:

timestamp   cookie  track
9:04:29     A       10420641
9:04:32     A       10420641
9:04:36     A       10420641
9:04:32     A       10420641
10:30:00    A       10420641
10:31:21    A       10420641
9:07:01     A       10881111
9:07:34     A       10881111
9:07:45     A       10881111
9:04:39     A       4326086

我希望得到一个结果:

timestamp   cookie  track       row
9:04:36     A       10420641    1
10:31:21    A       10420641    2
9:07:45     A       10881111    1
9:04:39     A       4326086     1

1 个答案:

答案 0 :(得分:0)

听起来您想要对数据进行会话。您可以使用lag()和累计金额执行此操作。像这样:

select min(timestamp), cookie, track, sessionid
from (select r.*, sum(IsSessionStart) over (partition by cookie, track order by timestamp) as sessionid
      from (select r.*,
                   (case when datediff(min,
                                       lag(timestamp) over (partition by cookie, track order by timestamp),
                                       timestamp) > 30
                         then 1 else 0 end) as IsSessionStart
            from records r
           ) r
     ) r
group by cookie, track, sessionId;