BigQuery和标准SQL:如何按任意日间隔进行分组

时间:2016-11-08 22:54:25

标签: sql google-bigquery

我是一个BigQuery和SQL新手,他们正在继续解决分组问题。在BigQuery中使用标准SQL,我希望按X天分组数据。这是一个数据表:

event_id |    url    |          timestamp   
-----------------------------------------------------------
   xx         a.html      2016-10-18 15:55:16 UTC
   xx         a.html      2016-10-19 16:68:55 UTC
   xx         a.html      2016-10-25 20:55:57 UTC
   yy         b.html      2016-10-18 15:58:09 UTC
   yy         b.html      2016-10-18 08:32:43 UTC
   zz         a.html      2016-10-20 04:44:22 UTC
   zz         c.html      2016-10-21 02:12:34 UTC

我想从给定日期开始,以X天的间隔计算每个网址上发生的每个事件的数量。例如:我如何在3天的时间间隔内对此进行分组,我的第一个时间间隔是在2016-10-18 00:00:00 UTC开始的?另外,我可以将间隔的第3天分配给每一行吗?示例输出:

event_id |    url    |  count |        3dayIntervalLabel   
-----------------------------------------------------------
   xx         a.html      2           2016-10-20  --> [18th thru 20th]
   yy         b.html      2           2016-10-20
   zz         a.html      1           2016-10-20 
   zz         c.html      1           2016-10-23  --> [21th thru 23th]
   xx         a.html      1           2016-10-26  --> [24th thru 26th]

我添加了三个注释来阐明3dayIntervalLabel值。

一般情况下,我希望解决:从日期Y开始按X天的间隔分组,并使用每个间隔的最终日期标记间隔。

如果需要进一步澄清,请告诉我。

如果您有兴趣,我还会在StackOverflow上提出类似的问题(并得到答案),使用滚动窗口对此数据进行分组:Line 313 in dashboard.jsinitial question

谢谢!

2 个答案:

答案 0 :(得分:3)

WITH dailyAggregations AS (
  SELECT 
    DATE(ts) AS day, 
    url, 
    event_id, 
    UNIX_SECONDS(TIMESTAMP(DATE(ts))) AS sec, 
    COUNT(1) AS events 
  FROM yourTable
  GROUP BY day, url, event_id, sec
),
calendar AS (
  SELECT day, DATE_ADD(day, INTERVAL 2 DAY) AS endday
  FROM UNNEST (GENERATE_DATE_ARRAY('2016-10-18', '2016-11-06', INTERVAL 3 DAY)) AS day
)
SELECT 
  event_id, 
  url,  
  SUM(events) AS `count`, 
  c.endday AS `ThreedayIntervalLabel`  
FROM calendar AS c
JOIN dailyAggregations AS a
ON a.day BETWEEN c.day AND c.endday
GROUP BY endday, url, event_id

答案 1 :(得分:0)

如果您有基准日期,那么就是这样:

select floor(date_diff(date(timestamp), date '2016-10-18', day) / 3) as days,
       count(*)
from t
group by days
order by days;