Question

我有一个包含用户活动记录的表，其中包含由开始和结束时间指示的范围。我正在寻找前一天每单位时间内在系统中活动的用户数。

最长会话长度为一小时，并且它们不跨越小时边界。会话可以结束，新会议可以在同一分钟开始。

以下是查询的精简版本：

with minutes AS (
    -- ignore this...it generates a day's worth of timestamps for each minute
    -- it's hairy but is what I'm stuck with on redshift
    select (dateadd(minute, -row_number() over (order by true), sysdate::date)) as minute
        from seed_table limit 1440
),
sessions as (
    select sid, ts_start, ts_end
    from user_sessions s
    where ts_end >= sysdate::date-'1 day'::interval 
        and ts_start < sysdate::date
)
select m.minute, count(distinct(s.sid))
from minutes m
left join sessions s on s.ts_end >= m.minute and s.ts_start < m.minute+'1 min'::interval
group by 1

我正试图避免那种令人讨厌的左连接：

->  XN Nested Loop Left Join DS_BCAST_INNER  (cost=6913826151.95..4727012848741.55 rows=410434560 width=166)
    Join Filter: (("inner".ts_start < ("outer"."minute" + '00:01:00'::interval)) AND ("inner".ts_end >= "outer"."minute"))

根据Gordon Linoff的回答，这些对我来说几乎是有用的。当用户在一分钟内的会话转换时，它会被计算在内。虽然看似正确的方向。由于同样的原因，原始查询可能会超过计数，但是获得一分钟不同会话ID计数的机会可以解决这个问题。

select minute, sum(count) over (order by minute rows unbounded preceding) as users
from (
    select minute, sum(count) as count
    from (
        (
            select date_trunc('minute', ts_start) as minute, count(*) as count
            from sessions
            group by 1
        ) union all (
            select date_trunc('minute', ts_end) as minute, - count(*) as count
            from sessions
            group by 1
        )
    ) s1
    group by minute
) s2
order by minute;

为了比较，以下是一小时数据的时间结果：

原始查询时间：81301.345 ms
总结查询时间：36242.342 ms

Answer 1

通过计算每分钟的开始和停止次数，然后计算累积总和，可以更快地完成这项工作。结果是这样的：

select minute, sum(cnt) over (order by minute)
from ((select date_trunc('minute', ts_start) as minute, count(*) as cnt
       from sessions
       group by 1
      ) union all
      (select date_trunc('minute', ts_end), - count(*)
       from sessions
       group by 1
      )
     ) s
group by minute
order by minute;

计算从开始和结束时间跨度得出的每分钟会话数

1 个答案: