我有一张看起来像这样的表:
Date | User_ID
2017-1-1 | 1
2017-1-1 | 2
2017-1-1 | 4
2017-1-2 | 3
2017-1-2 | 2
... | ..
... | ..
... | ..
... | ..
2017-2-1 | 1
2017-2-2 | 2
... | ..
... | ..
... | ..
我想在30天的滚动期内计算每月活跃用户数。我知道Redshift没有做COUNT(DISTINCT)窗口。我该怎么做以获得以下输出?
Date | MAU
2017-1-1 | 3
2017-1-2 | 4 <- We don't want to count user_id 2 twice.
... | ..
... | ..
... | ..
2017-2-1 | ..
2017-2-2 | ..
... | ..
... | ..
我试图这样做(显然失败了)。这是我的代码:
SELECT event_date
,sum(user_count) mau_count
,CASE
WHEN event_date = date_trunc('week', event_date)
THEN 1
ELSE 0
END week_starting FROM (
SELECT event_date
,count(*) OVER (PARTITION BY event_date ORDER BY event_date ROWS BETWEEN 30 PRECEDING
AND CURRENT ROW
) AS user_count <-- I know this is wrong. Just my attempt :)
FROM (
SELECT DISTINCT (user_id)
,event_date
FROM event_table
) daily_distinct_users
GROUP BY event_date
) cumulative_daily_distinct_users GROUP BY event_date;
请告诉我如何准确地获得MAU计数。谢谢!
答案 0 :(得分:1)
假设没有丢失日期,您可以先使用MIN
函数获取用户出现的第一个日期。然后获取每个日期的用户计数,然后使用SUM
函数获得滚动总和。
SELECT DISTINCT EVENT_DATE,
SUM(CNT) OVER(ORDER BY EVENT_DATE ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) AS MAU
FROM
(SELECT E.EVENT_DATE,
COUNT(DISTINCT T.USER_ID) AS CNT
FROM EVENT_TABLE E
LEFT JOIN
(SELECT DISTINCT USER_ID,
MIN(EVENT_DATE) OVER(PARTITION BY USER_ID
ORDER BY EVENT_DATE ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) AS FIRST_APPEARED_ON
FROM EVENT_TABLE
) T ON T.FIRST_APPEARED_ON=E.EVENT_DATE AND T.USER_ID=E.USER_ID
GROUP BY E.EVENT_DATE
) T1
答案 1 :(得分:1)
这个似乎有用(log
表中的列名是dt
和userid
):
SELECT
end_date,
-- The number of distinct users during the 30 days prior
COUNT(DISTINCT userid) distinct_users
FROM log
JOIN
( -- A list of dates to appear in the output first column
SELECT DISTINCT dt AS end_date
FROM log
WHERE dt BETWEEN date '2017-01-01' AND date '2017-01-31'
) ON dt BETWEEN end_date - interval '30 days' AND end_date
GROUP BY end_date
ORDER BY end_date
基本上,子选择会生成一个显示为第一个输出列的end_dates
列表。然后,它会加入到所选日期之前30天内显示的不同数量的userid
。
答案 2 :(得分:0)
对于那些偶然发现这个问题并且正在寻找更多内容的人,以下blog post描述了一种用于快速计算滚动MAU的替代预计算策略。对于这个问题来说这有点过头了,但如果你这样做可能会派上用场: