我有一个包含用户活动记录的表,其中包含由开始和结束时间指示的范围。我正在寻找前一天每单位时间内在系统中活动的用户数。
最长会话长度为一小时,并且它们不跨越小时边界。会话可以结束,新会议可以在同一分钟开始。
以下是查询的精简版本:
with minutes AS (
-- ignore this...it generates a day's worth of timestamps for each minute
-- it's hairy but is what I'm stuck with on redshift
select (dateadd(minute, -row_number() over (order by true), sysdate::date)) as minute
from seed_table limit 1440
),
sessions as (
select sid, ts_start, ts_end
from user_sessions s
where ts_end >= sysdate::date-'1 day'::interval
and ts_start < sysdate::date
)
select m.minute, count(distinct(s.sid))
from minutes m
left join sessions s on s.ts_end >= m.minute and s.ts_start < m.minute+'1 min'::interval
group by 1
我正试图避免那种令人讨厌的左连接:
-> XN Nested Loop Left Join DS_BCAST_INNER (cost=6913826151.95..4727012848741.55 rows=410434560 width=166)
Join Filter: (("inner".ts_start < ("outer"."minute" + '00:01:00'::interval)) AND ("inner".ts_end >= "outer"."minute"))
根据Gordon Linoff的回答,这些对我来说几乎是有用的。当用户在一分钟内的会话转换时,它会被计算在内。虽然看似正确的方向。由于同样的原因,原始查询可能会超过计数,但是获得一分钟不同会话ID计数的机会可以解决这个问题。
select minute, sum(count) over (order by minute rows unbounded preceding) as users
from (
select minute, sum(count) as count
from (
(
select date_trunc('minute', ts_start) as minute, count(*) as count
from sessions
group by 1
) union all (
select date_trunc('minute', ts_end) as minute, - count(*) as count
from sessions
group by 1
)
) s1
group by minute
) s2
order by minute;
为了比较,以下是一小时数据的时间结果:
答案 0 :(得分:2)
通过计算每分钟的开始和停止次数,然后计算累积总和,可以更快地完成这项工作。结果是这样的:
select minute, sum(cnt) over (order by minute)
from ((select date_trunc('minute', ts_start) as minute, count(*) as cnt
from sessions
group by 1
) union all
(select date_trunc('minute', ts_end), - count(*)
from sessions
group by 1
)
) s
group by minute
order by minute;