如何使用窗口函数获取今天,过去7天,每个值的最后30天的仪表?

时间:2016-01-06 23:33:40

标签: sql etl amazon-redshift window-functions

我的问题似乎很简单:

对于给定日期,请为给定日期的活跃用户,给定的Date()中的活跃用户 - 7,给定日期中的活跃用户() - 30

即。样本数据。

"timestamp" "user_public_id"
"23-Sep-15" "805a47023fa611e58ebb22000b680490"
"28-Sep-15" "d842b5bc5b1711e5a84322000b680490"
"01-Oct-15" "ac6b5f70b95911e0ac5312313d06dad5"
"21-Oct-15" "8c3e91e2749f11e296bb12313d086540"
"29-Nov-15" "b144298810ee11e4a3091231390eb251"

对于01-10,今天的计数将是1,last_7_days将是3,last_30_days将是3 + n(其中n将是在30天窗口中10月1日之前的日期中落入的user_ids的计数)

我正在红移亚马逊上。有人可以提供一个示例sql来帮助我入门吗? 输出应该如下所示:

"timestamp" "users_today", "users_last_7_days", "users_30_days"
"01-Oct-15"           1                 3           (3+n)

2 个答案:

答案 0 :(得分:2)

我知道寻求帮助/不完整的解决方案是不受欢迎的,但这没有得到任何其他关注,所以我想我会尽我所能。

我一直在试着把头发拉出来,唉,我是一个初学者,有些东西不是为了点击我。也许你自己或其他人将能够大大改善我的答案,但我认为我走在正确的轨道上。

SELECT replace(convert(varchar, [timestamp], 111), '/','-') AS [timestamp], -- to get date in same format as you require
(SELECT COUNT([TIMESTAMP]) FROM #SIMPLE WHERE ([TIMESTAMP]) = ([timestamp])) AS users_today,
(SELECT COUNT([TIMESTAMP]) FROM #SIMPLE WHERE [TIMESTAMP] BETWEEN DATEADD(DY,-7,[TIMESTAMP]) AND [TIMESTAMP]) AS users_last_7_days ,
(SELECT COUNT([TIMESTAMP]) FROM #SIMPLE WHERE [TIMESTAMP] BETWEEN DATEADD(DY,-30,[TIMESTAMP]) AND [timestamp]) AS users_last_30_days
FROM #SIMPLE
GROUP BY [timestamp]

从这开始:

CREATE TABLE #SIMPLE (
[timestamp] datetime, user_public_id varchar(32)
)

INSERT INTO #SIMPLE 
VALUES('23-Sep-15','805a47023fa611e58ebb22000b680490'),
('28-Sep-15','d842b5bc5b1711e5a84322000b680490'),
('01-Oct-15','ac6b5f70b95911e0ac5312313d06dad5'),
('21-Oct-15','8c3e91e2749f11e296bb12313d086540'),
('29-Nov-15','b144298810ee11e4a3091231390eb251')

我遇到的问题是每行包含相同的计数,尽管我按[timestamp]进行分组。

答案 1 :(得分:0)

步骤1--创建一个包含每日计数的表格。

create temp table daily_mobile_Sessions as
select "timestamp" ,
count(user_public_id) over (partition by  "timestamp"  ) as "today"
from mobile_sessions 
group by 1, mobile_sessions.user_public_id
order by 1 DESC

第2步 - 从上表中。我们创建了另一个表,可以使用"今天"字段,我们应用窗口函数来计算总和。

select "timestamp", today,
sum(today) over (order by "timestamp" rows between 6 PRECEDING and CURRENT ROW) as "last_7days",
sum(today) over (order by "timestamp" rows between 29 PRECEDING and CURRENT ROW) as "last_30days"
 from daily_mobile_Sessions group by "timestamp"  , 2 order by 1 desc