如何使用BigQuery计算DAU / MAU(参与)

时间:2015-10-20 01:26:39

标签: google-bigquery

DAU和MAU(每日活跃用户和每月活跃用户)是衡量用户参与度的既定方式。

如何使用SQL和Google BigQuery获取这些数字?

2 个答案:

答案 0 :(得分:4)

2019标准SQL更新:

(了解DAU / MAU的实用程序,请参阅http://blog.compariscope.wefi.com/mobile-app-usage-dau-mau

等文章

让我们使用存储在BigQuery中的reddit评论数据。我们想找出' AskReddit'的'dau / mau比率'。 9月期间每日滚动:

SELECT day, dau, mau, INTEGER(100*dau/mau) daumau
FROM (
  SELECT day, EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
  FROM (
    SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) day, author
    FROM [fh-bigquery:reddit_comments.2015_09]
    WHERE subreddit='AskReddit') a
  JOIN (
    SELECT stopday, EXACT_COUNT_DISTINCT(author) mau
    FROM (SELECT created_utc, subreddit, author FROM [fh-bigquery:reddit_comments.2015_09], [fh-bigquery:reddit_comments.2015_08]) a
    CROSS JOIN (
      SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) stopday
      FROM [fh-bigquery:reddit_comments.2015_09]
      GROUP BY 1
    ) b
    WHERE subreddit='AskReddit'
    AND SEC_TO_TIMESTAMP(created_utc) BETWEEN DATE_ADD(stopday, -30, 'day') AND TIMESTAMP(stopday)
    GROUP BY 1
  ) b
  ON a.day=b.stopday
  GROUP BY 1
)
ORDER BY 1

此查询获取9月份每天的DAU,并查看8月数据,以获取每个DAU日结束的每30天的MAU。这需要大量的处理(30x),如果我们只计算9月的一个MAU,我们可以获得几乎相同的结果,并继续使用该值作为分母:

SELECT day, dau, mau, INTEGER(100*dau/mau) daumau
FROM (
  SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) day, EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
  FROM [fh-bigquery:reddit_comments.2015_09] a
  CROSS JOIN (
    SELECT EXACT_COUNT_DISTINCT(author) mau
    FROM [fh-bigquery:reddit_comments.2015_09]
    WHERE subreddit='AskReddit'
  ) b
  WHERE subreddit='AskReddit'
  GROUP BY 1
)
ORDER BY 1

这是一个更简单的查询,可以更快地为我们带来几乎相同的结果。

现在获取该月份的subreddit的平均值:

SELECT ROUND(100*AVG(dau/mau), 2) daumau
FROM (
  SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) day, EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
  FROM [fh-bigquery:reddit_comments.2015_09] a
  CROSS JOIN (
    SELECT EXACT_COUNT_DISTINCT(author) mau
    FROM [fh-bigquery:reddit_comments.2015_09]
    WHERE subreddit='AskReddit'
  ) b
  WHERE subreddit='AskReddit'
  GROUP BY 1
)

这告诉我们' AskReddit' 9月份的参与率为8.95%。

最后一站,如何比较各种subreddits中的参与度:

SELECT ROUND(100*AVG(dau)/MAX(mau), 2) avg_daumau, MAX(mau) mau, subreddit
FROM (
  SELECT a.subreddit, DATE(SEC_TO_TIMESTAMP(created_utc)) day,
         EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
  FROM [fh-bigquery:reddit_comments.2015_09] a
  JOIN (
    SELECT EXACT_COUNT_DISTINCT(author) mau, subreddit
    FROM [fh-bigquery:reddit_comments.2015_09]
    GROUP BY 2
  ) b
  ON a.subreddit=b.subreddit
  WHERE mau>50000
  GROUP BY 1, 2
)

GROUP BY subreddit
ORDER BY 1

enter image description here

答案 1 :(得分:2)

为了分析趋势,而不是等待“完整的月份”,有必要用其前身30天来看待每一天...... 我担心建议的解决方案(由Felipe Hoffa提出)改变了问题,而不仅仅是数据检索查询。

你可以找到我对这个问题的看法。 我不确定它在性能方面做了什么,它不是很快(比Felipe的慢得多......),但它涵盖了我理解的业务需求。不过,如果你能提供一种优化这种方法的解决方案,那就太棒了。

请注意:不使用任何连接和子聚合,只需拆分,分组和日期操作。

<key>UISupportedInterfaceOrientations</key>
    <array>
        <string>UIInterfaceOrientationPortrait</string>
    </array>