使用BigQuery查找每天运行的用户总数

时间:2015-08-25 20:22:07

标签: sql google-bigquery

我有一些用户数据,如下所示,我希望每天看到的唯一身份用户数总计。从基本查询开始:

SELECT
  day, user_id, COUNT(DISTINCT(user_id)) AS cnt

FROM
  (select "A" user_id, "2015-02-01" day),
  (select "A" user_id, "2015-02-01" day),
  (select "A" user_id, "2015-02-01" day),
  (select "A" user_id, "2015-02-01" day),
  (select "A" user_id, "2015-02-01" day),
  (select "A" user_id, "2015-02-01" day),

  (select "B" user_id, "2015-02-01" day),
  (select "B" user_id, "2015-02-02" day),
  (select "B" user_id, "2015-02-02" day),
  (select "B" user_id, "2015-02-02" day),

  (select "C" user_id, "2015-02-01" day),
  (select "C" user_id, "2015-02-02" day),

  (select "D" user_id, "2015-02-04" day)

GROUP BY
  day, user_id

该小组的结果是:

Row day         user_id cnt  
1   2015-02-01  A        1   
2   2015-02-01  B        1   
3   2015-02-02  B        1   
4   2015-02-01  C        1   
5   2015-02-02  C        1   
6   2015-02-04  D        1

我可以看到2015-02-01上有三个唯一身份用户,而2015-02-04之前没有新用户,只有一个用户(用户D)。

我需要将结果看起来像这样:

Row  day         running_count
1    2015-02-01  3
2    2015-02-02  3
3    2015-02-03  3
3    2015-02-04  4

running_count对应于每天新用户数量的运行记录。例如,2015-02-02为零,因为只有user_id的B& C出现在当天,但已经计入2015-02-01

提前感谢您的帮助。

1 个答案:

答案 0 :(得分:1)

仅查看MIN(日期),SUM()OVER()以查看运行计数。它将缺少临时日期,但您可以通过LEFT JOIN

获得
SELECT day, SUM(c) OVER(ORDER BY day)
FROM (
  SELECT day, COUNT(DISTINCT user_id) c
  FROM (
    SELECT MIN(day) day, user_id
    FROM
      (select "A" user_id, "2015-02-01" day),
      (select "A" user_id, "2015-02-01" day),
      (select "A" user_id, "2015-02-01" day),
      (select "A" user_id, "2015-02-01" day),
      (select "A" user_id, "2015-02-01" day),
      (select "A" user_id, "2015-02-01" day),

      (select "B" user_id, "2015-02-01" day),
      (select "B" user_id, "2015-02-02" day),
      (select "B" user_id, "2015-02-02" day),
      (select "B" user_id, "2015-02-02" day),

      (select "C" user_id, "2015-02-01" day),
      (select "C" user_id, "2015-02-02" day),

      (select "D" user_id, "2015-02-04" day)
    GROUP BY user_id
  ) 
  GROUP BY day
)