谷歌BigQuery:滚动计数不同

时间:2016-02-03 10:25:38

标签: sql google-bigquery

我有一张表只是一个日期和用户ID列表(未汇总)。

我们通过计算过去45天内显示的不同ID数量,为给定日期定义一个名为活跃用户的指标。

我正在尝试在BigQuery中运行一个查询,该查询每天返回当天加上当天活跃用户的数量(从45天前算到不同用户直到今天)。

我已尝试过窗口函数,但无法弄清楚如何根据列中的日期值定义范围。相反,我相信以下查询可以在像MySQL这样的数据库中工作,但在BigQuery中不会。

SELECT 
  day,
  (SELECT 
    COUNT(DISTINCT visid) 
   FROM daily_users
   WHERE day BETWEEN DATE_ADD(t.day, -45, "DAY") AND t.day
   ) AS active_users
FROM daily_users AS t
GROUP BY 1

这在BigQuery中不起作用:“SELECT子句中不允许进行子选择。”

如何在BigQuery中执行此操作?

2 个答案:

答案 0 :(得分:1)

BigQuery documentation声称count(distinct)可用作窗口函数。但是,这对您没有帮助,因为您不是在寻找传统的窗框。

一种方法会在访问后为每个日期添加记录:

select theday, count(distinct visid)
from (select date_add(u.day, n.n, "day") as theday, u.visid
      from daily_users u cross join
           (select 1 as n union all select 2 union all . . .
            select 45
           ) n
     ) u
group by theday;

注意:可能有更简单的方法在BigQuery中生成一系列45个整数。

答案 1 :(得分:1)

下面应该使用BigQuery

#legacySQL
SELECT day, active_users FROM (
  SELECT 
    day, 
    COUNT(DISTINCT id) 
      OVER (ORDER BY ts RANGE BETWEEN 45*24*3600 PRECEDING AND CURRENT ROW) AS active_users
  FROM (
    SELECT day, id, TIMESTAMP_TO_SEC(TIMESTAMP(day)) AS ts 
    FROM daily_users
  )
) GROUP BY 1, 2 ORDER BY 1  

以上假设day字段表示为' 2016-01-10'格式。
如果不是这种情况,您应该在大多数内部选择​​中调整TIMESTAMP_TO_SEC(TIMESTAMP(day))

另请参阅BigQuery中的COUNT(DISTINC)详细信息

  

BigQuery Standard SQL更新

   
#standardSQL
SELECT 
  day, 
  (SELECT COUNT(DISTINCT id) FROM UNNEST(active_users) id) AS active_users
FROM (
  SELECT 
    day, 
    ARRAY_AGG(id) 
      OVER (ORDER BY ts RANGE BETWEEN 3888000 PRECEDING AND CURRENT ROW) AS active_users
  FROM (
    SELECT day, id,  UNIX_DATE(PARSE_DATE('%Y-%m-%d', day)) * 24 * 3600 AS ts 
    FROM daily_users
  )
) 
GROUP BY 1, 2 
ORDER BY 1  

您可以使用以下虚拟样本

进行测试/播放
#standardSQL
WITH daily_users AS (
  SELECT 1 AS id, '2016-01-10' AS day UNION ALL
  SELECT 2 AS id, '2016-01-10' AS day UNION ALL
  SELECT 1 AS id, '2016-01-11' AS day UNION ALL
  SELECT 3 AS id, '2016-01-11' AS day UNION ALL
  SELECT 1 AS id, '2016-01-12' AS day UNION ALL
  SELECT 1 AS id, '2016-01-12' AS day UNION ALL
  SELECT 1 AS id, '2016-01-12' AS day UNION ALL
  SELECT 1 AS id, '2016-01-13' AS day
)
SELECT 
  day, 
  (SELECT COUNT(DISTINCT id) FROM UNNEST(active_users) id) AS active_users
FROM (
  SELECT 
    day, 
    ARRAY_AGG(id) 
      OVER (ORDER BY ts RANGE BETWEEN 86400 PRECEDING AND CURRENT ROW) AS active_users
  FROM (
    SELECT day, id,  UNIX_DATE(PARSE_DATE('%Y-%m-%d', day)) * 24 * 3600 AS ts 
    FROM daily_users
  )
) 
GROUP BY 1, 2 
ORDER BY 1