计算过去X天的会话数

时间:2016-11-18 17:38:51

标签: postgresql amazon-redshift

我的数据库中有客户的连接日志表,我试图每天依靠每个客户在过去七天内建立的连接数。我使用的源表具有

的模式
uuid, sessionuid, connection_timestamp

我想要的输出是

uuid, _date, total_connections_over_trailing_seven_days,

这样我就可以查看给定客户帐户和指定日期,该人在过去七天(或其他)中连接的次数。

我为完成此任务而编写的查询是

SELECT
  uuid, 
  connection_timestamp::date as _date, 
  COUNT(sessionuid) OVER (ORDER BY timestamp_session ROWS 6 PRECEDING) as trailing_seven_day_session_count
FROM connection_history_table

但是当我执行这个查询时,我为每个用户和源表中的每个connection_timestamp获取一个单独的行,而不是每个唯一的connection_timestamp :: date的单个记录。此外,trailing_seven_day_session_count中的值从1增加到最大值7(如果在给定的一天中至少有7个会话),但在此之后不会增加。因此,我似乎在计算特定日期的会话数,但仅限于前7个会话。

uuid     _date              trailing_seven_day_session_count
16398   2015-02-18 00:00:00 1
16398   2015-02-18 00:00:00 2
16398   2015-02-18 00:00:00 3
16398   2015-02-18 00:00:00 4
16398   2015-02-18 00:00:00 5
16398   2015-02-18 00:00:00 6
16398   2015-02-18 00:00:00 7
16398   2015-02-18 00:00:00 8
16398   2015-02-18 00:00:00 8
16398   2015-02-25 00:00:00 1
16398   2015-02-25 00:00:00 2
16398   2015-02-25 00:00:00 3
16398   2015-02-25 00:00:00 4
16398   2015-02-25 00:00:00 5
16398   2015-02-25 00:00:00 6
16398   2015-02-25 00:00:00 7
16398   2015-02-25 00:00:00 8
16398   2015-02-25 00:00:00 8

我不熟悉使用窗口功能,我不清楚我在这里做错了什么。我已尝试通过connection_timestamp :: date进行分区,但这也没有帮助。我基本上都在抓稻草,这样做不成功。

谢谢, 布拉德

1 个答案:

答案 0 :(得分:1)

也许您需要计算每天的会话数,然后计算前几天的总和。像这样:

select
    uuid,
    day,
    sum(sessions) over (partition by uuid order by day rows 6 preceding) as trailing_seven_day_session_count
from (select uuid, connection_timestamp::date as day, count(*) sessions
    from connection_history_table
    group by 1,2)
order by 1,2

关于Brad关于稀疏数据的评论,这是一种可能的方法。它生成零记录以填充天数,因此回顾一定数量的记录将与天数相关联。 Haven没有这样做,但它应该非常接近。由于它产生了几天,因此需要调整整个时间范围。我不确定我是否获得了日期和填充权...它试图获得37天的数据来生成30天的记录。

with days as (
    -- hack to generate days in redshift like a generate_series function
    select (dateadd(day, -row_number() over (order by true), sysdate::date)) as day
            from stv_blocklist limit 37
),
day_counts as (
    select uuid, connection_timestamp::date as day, count(*) sessions
    from connection_history_table
    where connection_timestamp >= sysdate-37
    group by 1,2
),
zero_days as (
    select s.uuid, d.day, 0 as sessions
    from (
        select distinct uuid from connection_history_table
        where connection_timestamp >= sysdate-37
    ) s
    cross join days d
)
select
    uuid,
    day,
    sum(sessions) over (partition by uuid order by day rows 6 preceding) as trailing_seven_day_session_count
from (
    select uuid, day, sessions from day_counts
    union all
    select uuid, day, sessions from zero_days z
        left join day_counts c on z.uuid=c.uuid and z.day=c.day
        where c.uuid is null
)
having day >= sysdate-30
order by 1,2