DAU和MAU(每日活跃用户和每月活跃用户)是衡量用户参与度的既定方式。
如何使用SQL和Google BigQuery获取这些数字?
答案 0 :(得分:4)
2019标准SQL更新:
(了解DAU / MAU的实用程序,请参阅http://blog.compariscope.wefi.com/mobile-app-usage-dau-mau)
等文章让我们使用存储在BigQuery中的reddit评论数据。我们想找出' AskReddit'的'dau / mau比率'。 9月期间每日滚动:
SELECT day, dau, mau, INTEGER(100*dau/mau) daumau
FROM (
SELECT day, EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
FROM (
SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) day, author
FROM [fh-bigquery:reddit_comments.2015_09]
WHERE subreddit='AskReddit') a
JOIN (
SELECT stopday, EXACT_COUNT_DISTINCT(author) mau
FROM (SELECT created_utc, subreddit, author FROM [fh-bigquery:reddit_comments.2015_09], [fh-bigquery:reddit_comments.2015_08]) a
CROSS JOIN (
SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) stopday
FROM [fh-bigquery:reddit_comments.2015_09]
GROUP BY 1
) b
WHERE subreddit='AskReddit'
AND SEC_TO_TIMESTAMP(created_utc) BETWEEN DATE_ADD(stopday, -30, 'day') AND TIMESTAMP(stopday)
GROUP BY 1
) b
ON a.day=b.stopday
GROUP BY 1
)
ORDER BY 1
此查询获取9月份每天的DAU,并查看8月数据,以获取每个DAU日结束的每30天的MAU。这需要大量的处理(30x),如果我们只计算9月的一个MAU,我们可以获得几乎相同的结果,并继续使用该值作为分母:
SELECT day, dau, mau, INTEGER(100*dau/mau) daumau
FROM (
SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) day, EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
FROM [fh-bigquery:reddit_comments.2015_09] a
CROSS JOIN (
SELECT EXACT_COUNT_DISTINCT(author) mau
FROM [fh-bigquery:reddit_comments.2015_09]
WHERE subreddit='AskReddit'
) b
WHERE subreddit='AskReddit'
GROUP BY 1
)
ORDER BY 1
这是一个更简单的查询,可以更快地为我们带来几乎相同的结果。
现在获取该月份的subreddit的平均值:
SELECT ROUND(100*AVG(dau/mau), 2) daumau
FROM (
SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) day, EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
FROM [fh-bigquery:reddit_comments.2015_09] a
CROSS JOIN (
SELECT EXACT_COUNT_DISTINCT(author) mau
FROM [fh-bigquery:reddit_comments.2015_09]
WHERE subreddit='AskReddit'
) b
WHERE subreddit='AskReddit'
GROUP BY 1
)
这告诉我们' AskReddit' 9月份的参与率为8.95%。
最后一站,如何比较各种subreddits中的参与度:
SELECT ROUND(100*AVG(dau)/MAX(mau), 2) avg_daumau, MAX(mau) mau, subreddit
FROM (
SELECT a.subreddit, DATE(SEC_TO_TIMESTAMP(created_utc)) day,
EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
FROM [fh-bigquery:reddit_comments.2015_09] a
JOIN (
SELECT EXACT_COUNT_DISTINCT(author) mau, subreddit
FROM [fh-bigquery:reddit_comments.2015_09]
GROUP BY 2
) b
ON a.subreddit=b.subreddit
WHERE mau>50000
GROUP BY 1, 2
)
GROUP BY subreddit
ORDER BY 1
答案 1 :(得分:2)
为了分析趋势,而不是等待“完整的月份”,有必要用其前身30天来看待每一天...... 我担心建议的解决方案(由Felipe Hoffa提出)改变了问题,而不仅仅是数据检索查询。
你可以找到我对这个问题的看法。 我不确定它在性能方面做了什么,它不是很快(比Felipe的慢得多......),但它涵盖了我理解的业务需求。不过,如果你能提供一种优化这种方法的解决方案,那就太棒了。
请注意:不使用任何连接和子聚合,只需拆分,分组和日期操作。
<key>UISupportedInterfaceOrientations</key>
<array>
<string>UIInterfaceOrientationPortrait</string>
</array>