我正在使用Amazon Redshift SQL查询,我尝试根据3列对记录进行排序:timestamp,cookieID,trackingpointID,但是当这3列所排序的两个记录之间的时差小于30分钟时(使用同样的cookieID和跟踪点),我想省略这些记录并留下最高记录,例如,如果我有:
timestamp cookie track
9:04:29 A 10420641
9:04:32 A 10420641
9:04:36 A 10420641
9:04:32 A 10420641
10:30:00 A 10420641
10:31:21 A 10420641
9:07:01 A 10881111
9:07:34 A 10881111
9:07:45 A 10881111
9:04:39 A 4326086
我希望得到一个结果:
timestamp cookie track row
9:04:36 A 10420641 1
10:31:21 A 10420641 2
9:07:45 A 10881111 1
9:04:39 A 4326086 1
答案 0 :(得分:0)
听起来您想要对数据进行会话。您可以使用lag()
和累计金额执行此操作。像这样:
select min(timestamp), cookie, track, sessionid
from (select r.*, sum(IsSessionStart) over (partition by cookie, track order by timestamp) as sessionid
from (select r.*,
(case when datediff(min,
lag(timestamp) over (partition by cookie, track order by timestamp),
timestamp) > 30
then 1 else 0 end) as IsSessionStart
from records r
) r
) r
group by cookie, track, sessionId;