重新计算用户时SQL中的同类群组分析

时间:2017-08-22 08:06:44

标签: sql amazon-redshift

我尝试使用SQL创建同类群组查询。 通常使用群组分析,我们会查看用户并检查是否在特定时间执行特定操作的用户,并计算该用户是否执行相同的操作。

WITH by_week
AS (SELECT
  user_id,
  TD_DATE_TRUNC('week', login_time) AS login_week
FROM logins
GROUP BY 1, 2),
with_first_week
AS (SELECT
  user_id,
  login_week,
  FIRST_VALUE(login_week) OVER (PARTITION BY user_id ORDER BY login_week) AS first_week
FROM by_week),
with_week_number
AS (SELECT
  user_id,
  login_week,
  first_week,
  (login_week - first_week) / (24 * 60 * 60 * 7) AS week_number
FROM with_first_week)
SELECT
  TD_TIME_FORMAT(first_week, 'yyyy-MM-dd') AS first_week,
  SUM(CASE WHEN week_number = 1 THEN 1 ELSE 0 END) AS week_1,
  SUM(CASE WHEN week_number = 2 THEN 1 ELSE 0 END) AS week_2,
  SUM(CASE WHEN week_number = 3 THEN 1 ELSE 0 END) AS week_3,
  SUM(CASE WHEN week_number = 4 THEN 1 ELSE 0 END) AS week_4,
  SUM(CASE WHEN week_number = 5 THEN 1 ELSE 0 END) AS week_5,
  SUM(CASE WHEN week_number = 6 THEN 1 ELSE 0 END) AS week_6,
  SUM(CASE WHEN week_number = 7 THEN 1 ELSE 0 END) AS week_7,
  SUM(CASE WHEN week_number = 8 THEN 1 ELSE 0 END) AS week_8,
  SUM(CASE WHEN week_number = 9 THEN 1 ELSE 0 END) AS week_9
FROM with_week_number
GROUP BY 1
ORDER BY 1

但是现在说,我不太关心第一次/用户级分析,我只想知道我的登录操作是否会随着时间的推移而增加(即我想在第一个队列中添加登录信息)第2周,第1周的第二组登录)。有没有简单/优雅的方法来做到这一点?

编辑:

在下面给出一个例子

WeekStart     Week1              Week2          Week 3
2017/05/03     66                **53**         **49**
2017/05/10  (**53**+74)        (**49**+70)      **65**
2017/05/17  (**49**+ 70 + 45)   (**65** + 80)     etc.

1 个答案:

答案 0 :(得分:1)

我认为您需要按login_week而不是first_week进行分组,因此您计算每一行中给定周内的所有登录,而不是群组,然后您必须使用>=而不是=所以它将总结本周的队列与所有给定行中的所有年龄组。

WITH 
by_week AS (
    SELECT
    user_id,
    TD_DATE_TRUNC('week', login_time) AS login_week
    FROM logins
    GROUP BY 1, 2
)
,with_first_week AS (
    SELECT
    user_id,
    login_week,
    FIRST_VALUE(login_week) OVER (PARTITION BY user_id ORDER BY login_week) AS first_week
    FROM by_week
)
,with_week_number AS (
    SELECT
    user_id,
    login_week,
    first_week,
    (login_week - first_week) / (24 * 60 * 60 * 7) AS week_number
    FROM with_first_week
)
SELECT
TD_TIME_FORMAT(login_week, 'yyyy-MM-dd') AS login_week,
SUM(CASE WHEN week_number>= 1 THEN 1 ELSE 0 END) AS week_1,
SUM(CASE WHEN week_number>= 2 THEN 1 ELSE 0 END) AS week_2,
SUM(CASE WHEN week_number>= 3 THEN 1 ELSE 0 END) AS week_3,
SUM(CASE WHEN week_number>= 4 THEN 1 ELSE 0 END) AS week_4,
SUM(CASE WHEN week_number>= 5 THEN 1 ELSE 0 END) AS week_5,
SUM(CASE WHEN week_number>= 6 THEN 1 ELSE 0 END) AS week_6,
SUM(CASE WHEN week_number>= 7 THEN 1 ELSE 0 END) AS week_7,
SUM(CASE WHEN week_number>= 8 THEN 1 ELSE 0 END) AS week_8,
SUM(CASE WHEN week_number>= 9 THEN 1 ELSE 0 END) AS week_9
FROM with_week_number
GROUP BY 1
ORDER BY 1;