我的数据库中有客户的连接日志表,我试图每天依靠每个客户在过去七天内建立的连接数。我使用的源表具有
的模式uuid, sessionuid, connection_timestamp
我想要的输出是
uuid, _date, total_connections_over_trailing_seven_days,
这样我就可以查看给定客户帐户和指定日期,该人在过去七天(或其他)中连接的次数。
我为完成此任务而编写的查询是
SELECT
uuid,
connection_timestamp::date as _date,
COUNT(sessionuid) OVER (ORDER BY timestamp_session ROWS 6 PRECEDING) as trailing_seven_day_session_count
FROM connection_history_table
但是当我执行这个查询时,我为每个用户和源表中的每个connection_timestamp获取一个单独的行,而不是每个唯一的connection_timestamp :: date的单个记录。此外,trailing_seven_day_session_count中的值从1增加到最大值7(如果在给定的一天中至少有7个会话),但在此之后不会增加。因此,我似乎在计算特定日期的会话数,但仅限于前7个会话。
uuid _date trailing_seven_day_session_count
16398 2015-02-18 00:00:00 1
16398 2015-02-18 00:00:00 2
16398 2015-02-18 00:00:00 3
16398 2015-02-18 00:00:00 4
16398 2015-02-18 00:00:00 5
16398 2015-02-18 00:00:00 6
16398 2015-02-18 00:00:00 7
16398 2015-02-18 00:00:00 8
16398 2015-02-18 00:00:00 8
16398 2015-02-25 00:00:00 1
16398 2015-02-25 00:00:00 2
16398 2015-02-25 00:00:00 3
16398 2015-02-25 00:00:00 4
16398 2015-02-25 00:00:00 5
16398 2015-02-25 00:00:00 6
16398 2015-02-25 00:00:00 7
16398 2015-02-25 00:00:00 8
16398 2015-02-25 00:00:00 8
我不熟悉使用窗口功能,我不清楚我在这里做错了什么。我已尝试通过connection_timestamp :: date进行分区,但这也没有帮助。我基本上都在抓稻草,这样做不成功。
谢谢, 布拉德
答案 0 :(得分:1)
也许您需要计算每天的会话数,然后计算前几天的总和。像这样:
select
uuid,
day,
sum(sessions) over (partition by uuid order by day rows 6 preceding) as trailing_seven_day_session_count
from (select uuid, connection_timestamp::date as day, count(*) sessions
from connection_history_table
group by 1,2)
order by 1,2
关于Brad关于稀疏数据的评论,这是一种可能的方法。它生成零记录以填充天数,因此回顾一定数量的记录将与天数相关联。 Haven没有这样做,但它应该非常接近。由于它产生了几天,因此需要调整整个时间范围。我不确定我是否获得了日期和填充权...它试图获得37天的数据来生成30天的记录。
with days as (
-- hack to generate days in redshift like a generate_series function
select (dateadd(day, -row_number() over (order by true), sysdate::date)) as day
from stv_blocklist limit 37
),
day_counts as (
select uuid, connection_timestamp::date as day, count(*) sessions
from connection_history_table
where connection_timestamp >= sysdate-37
group by 1,2
),
zero_days as (
select s.uuid, d.day, 0 as sessions
from (
select distinct uuid from connection_history_table
where connection_timestamp >= sysdate-37
) s
cross join days d
)
select
uuid,
day,
sum(sessions) over (partition by uuid order by day rows 6 preceding) as trailing_seven_day_session_count
from (
select uuid, day, sessions from day_counts
union all
select uuid, day, sessions from zero_days z
left join day_counts c on z.uuid=c.uuid and z.day=c.day
where c.uuid is null
)
having day >= sysdate-30
order by 1,2