我将每日日志数据存储在Postgres数据库中,该数据库以id和date结构化。显然,如果多次登录,用户可以在数据库中拥有多行。
要想象:
| id | timestamp |
|------|---------------------|
| 0099 | 2004-10-19 10:23:54 |
| 1029 | 2004-10-01 10:23:54 |
| 2353 | 2004-10-20 8:23:54 |
假设MAU(“每月活跃用户数”)被定义为在给定日历月登录的唯一 ID的数量。我想在一个月内获得每天MAU的滚动总和,即MAU在不同时间点的增长。例如,如果我们查看2014年10月:
| date | MAU |
|------------|-------|
| 2014-10-01 | 10000 |
| 2014-10-02 | 12948 |
| 2014-10-03 | 13465 |
等到月底。我听说窗口函数可能是解决这个问题的一种方法。任何想法如何利用它来获得滚动的MAU总和?
答案 0 :(得分:1)
阅读the documentation for Postgres window functions后,这是一个获得当月滚动MAU总和的解决方案:
-- First, get id and date of each timestamp within the current month
WITH raw_data as (SELECT id, date_trunc('day', timestamp) as timestamp
FROM user_logs
WHERE date_trunc('month', timestamp) = date_trunc('month', current_timestamp)),
-- Since we only want to count the earliest login for a month
-- for a given login, use MIN() to aggregate
month_data as (SELECT id, MIN(timestamp) as timestamp_day FROM raw_data GROUP BY id)
-- Postgres doesn't support DISTINCT for window functions, so query
-- from the rolling sum to have each row as a day
SELECT timestamp_day as date, MAX(count) as MAU
FROM (SELECT timestamp_day, COUNT(id) OVER(ORDER BY timestamp_day) FROM month_data) foo
GROUP By timestamp_day
答案 1 :(得分:0)
对于给定的月份,您可以通过在用户看到月份的第一天添加用户来计算:
select date_trunc('day', mints), count(*) as usersOnDay,
sum(count(*)) over (order by date_trunc('day', mints)) as cume_users
from (select id, min(timestamp) as mints
from log
where timestamp >= '2004-10-01'::date and timestamp < '2004-11-01'::date
group by id
) l
group by date_trunc('day', mints);
注意:这回答了大约一个月的问题。这可以扩展到更多日历个月,您可以在第一天计算唯一身份用户,然后再添加增量。
如果您有一个累积期间超过月份边界的问题,请询问另一个问题并解释在这种情况下一个月的含义。