SQL:通过窗口函数计算MAU

时间:2017-01-30 09:54:07

标签: sql window-functions vertica

我尝试使用窗口函数计算MAU-月度不同的活跃用户,但未成功。 我需要计算当月的每一天,前30天

这是我到目前为止所做的:

select 
  t.datee
, t.app,i.sourcee
, i.campaign
, t.mobile
, sum(count(distinct t.user_id)) over (
     PARTITION BY 
       date_trunc('month',datee)
     , t.app
    , i.sourcee
    , i.campaign
    , t.mobile 
    ORDER BY datee asc 
    ROWS BETWEEN 30 PRECEDING AND CURRENT ROW
  )
FROM dim_x i
JOIN agg_y t
  ON  i.app=t.app
 AND i.mobile=t.mobile
WHERE t.datee>=CURRENT_DATE-30
  AND t.datee<CURRENT_DATE  
GROUP BY 1,2,3,4,5
order by 1 desc

但我所得到的只是活跃用户的总和,而不是不同用户的总和。我使用的是Vertica db。

有什么建议吗?

1 个答案:

答案 0 :(得分:0)

我没有,真的,为什么你需要一个OLAP表达式。

您是否在寻找每个不同用户的总数:

  • 年月组合已过期
  • 应用
  • sourcee(无论可能是什么)
  • 运动
  • mobile(可能是手机号码)

就我而言,一个简单的GROUP BY会做。如果我忽略了来源,广告系列和移动设备,只需从一个表格中选择input以获取参数,并使用我刚刚编写的一些示例数据,此查询:

SELECT
  YEAR(datee) * 100 + MONTH(datee) AS yearmonth
, app
, COUNT(DISTINCT user_id) AS monthly_active_users
FROM input
GROUP BY 1,2
ORDER BY 1
;

......会回来:

YEARMONTH|app  |monthly_active_users
  201,601|app-a|                   2
  201,601|app-b|                   2
  201,602|app-a|                   2
  201,602|app-b|                   2
  201,603|app-a|                   2
  201,603|app-b|                   2
  201,604|app-a|                   2
  201,604|app-b|                   2
  201,605|app-a|                   2
  201,605|app-b|                   2
  201,606|app-a|                   1
  201,606|app-b|                   1

只需编辑我以前的答案。您似乎需要运行用户ID的COUNT DISTINCT,并由多个表达式进行分区。

使用下面的WITH子句的输入,你需要这样的报告(只显示53的前12行,按datee,app排序)?

datee     |app  |user_id |running_active_users
2016-01-01|app-a|arthur  |                   1
2016-01-04|app-b|ford    |                   1
2016-01-07|app-a|trillian|                   2
2016-01-10|app-b|zaphod  |                   2
2016-01-13|app-a|arthur  |                   2
2016-01-16|app-b|ford    |                   2
2016-01-19|app-a|trillian|                   2
2016-01-22|app-b|zaphod  |                   2
2016-01-25|app-a|arthur  |                   2
2016-01-28|app-b|ford    |                   2
2016-01-31|app-a|trillian|                   2
2016-02-03|app-b|zaphod  |                   2

如果是这样的话,我不会看到你的GROUP BY子句存在的原因。

下面是GROUP BY的查询,上面的测试数据在WITH子句中返回上面的结果。将该输入视为两个表之间的连接。

WITH
input(datee,app,user_id) AS (
          SELECT DATE '2016-01-01','app-a','arthur'
UNION ALL SELECT DATE '2016-01-04','app-b','ford'
UNION ALL SELECT DATE '2016-01-07','app-a','trillian'
UNION ALL SELECT DATE '2016-01-10','app-b','zaphod'
UNION ALL SELECT DATE '2016-01-25','app-a','arthur'
UNION ALL SELECT DATE '2016-01-28','app-b','ford'
UNION ALL SELECT DATE '2016-03-04','app-b','ford'
UNION ALL SELECT DATE '2016-03-25','app-a','arthur'
UNION ALL SELECT DATE '2016-04-09','app-b','ford'
UNION ALL SELECT DATE '2016-04-30','app-a','arthur'
UNION ALL SELECT DATE '2016-05-06','app-a','trillian'
UNION ALL SELECT DATE '2016-05-09','app-b','zaphod'
UNION ALL SELECT DATE '2016-05-15','app-b','ford'
UNION ALL SELECT DATE '2016-06-05','app-a','arthur'
UNION ALL SELECT DATE '2016-01-13','app-a','arthur'
UNION ALL SELECT DATE '2016-01-16','app-b','ford'
UNION ALL SELECT DATE '2016-01-31','app-a','trillian'
UNION ALL SELECT DATE '2016-02-03','app-b','zaphod'
UNION ALL SELECT DATE '2016-02-06','app-a','arthur'
UNION ALL SELECT DATE '2016-02-09','app-b','ford'
UNION ALL SELECT DATE '2016-02-12','app-a','trillian'
UNION ALL SELECT DATE '2016-02-15','app-b','zaphod'
UNION ALL SELECT DATE '2016-02-18','app-a','arthur'
UNION ALL SELECT DATE '2016-02-21','app-b','ford'
UNION ALL SELECT DATE '2016-02-24','app-a','trillian'
UNION ALL SELECT DATE '2016-02-27','app-b','zaphod'
UNION ALL SELECT DATE '2016-03-01','app-a','arthur'
UNION ALL SELECT DATE '2016-03-10','app-b','zaphod'
UNION ALL SELECT DATE '2016-03-13','app-a','arthur'
UNION ALL SELECT DATE '2016-03-16','app-b','ford'
UNION ALL SELECT DATE '2016-03-28','app-b','ford'
UNION ALL SELECT DATE '2016-03-31','app-a','trillian'
UNION ALL SELECT DATE '2016-04-06','app-a','arthur'
UNION ALL SELECT DATE '2016-04-12','app-a','trillian'
UNION ALL SELECT DATE '2016-04-15','app-b','zaphod'
UNION ALL SELECT DATE '2016-04-27','app-b','zaphod'
UNION ALL SELECT DATE '2016-05-03','app-b','ford'
UNION ALL SELECT DATE '2016-05-27','app-b','ford'
UNION ALL SELECT DATE '2016-05-30','app-a','trillian'
UNION ALL SELECT DATE '2016-01-19','app-a','trillian'
UNION ALL SELECT DATE '2016-01-22','app-b','zaphod'
UNION ALL SELECT DATE '2016-03-07','app-a','trillian'
UNION ALL SELECT DATE '2016-03-19','app-a','trillian'
UNION ALL SELECT DATE '2016-03-22','app-b','zaphod'
UNION ALL SELECT DATE '2016-04-03','app-b','zaphod'
UNION ALL SELECT DATE '2016-04-18','app-a','arthur'
UNION ALL SELECT DATE '2016-04-21','app-b','ford'
UNION ALL SELECT DATE '2016-04-24','app-a','trillian'
UNION ALL SELECT DATE '2016-05-12','app-a','arthur'
UNION ALL SELECT DATE '2016-05-18','app-a','trillian'
UNION ALL SELECT DATE '2016-05-21','app-b','zaphod'
UNION ALL SELECT DATE '2016-05-24','app-a','arthur'
UNION ALL SELECT DATE '2016-06-02','app-b','zaphod'
)
SELECT
  YEAR(datee) * 100 + MONTH(datee) AS YEARMONTH
, app
, COUNT(DISTINCT user_id) AS monthly_active_users
FROM input
GROUP BY 1,2
ORDER BY 1
;