滑动窗口聚合Big Query 15分钟聚合

时间:2017-10-26 12:42:55

标签: time-series google-bigquery aggregate

我有一张这样的表

行时间viewCount
1 00:00:00 31
2 00:00:01 44
3 00:00:02 78
4 00:00:03 71
5 00:00:04 72
6 00:00:05 73
7 00:00:06 64
8 00:00:07 70

我想把它汇总成

行时间viewCount
1 00:00:00 31
2 00:15:00 445
3 00:30:00 700
4 00:45:00 500
5 01:00:04 121
6 01:15:00 475 。 。

请帮忙。提前致谢

2 个答案:

答案 0 :(得分:1)

以下是BigQuery Standard SQL

   

注意:您提供了数据的简化示例,下面是它 - 所以它不是每15分钟聚合一次,而是使用每2秒聚合。这是为了让您能够轻松地测试/玩它。通过在3个位置将SECOND更改为MINUTE并在3个位置将2更改为15,可以轻松将其调整为15分钟。此示例还使用TIME数据类型作为时间字段,因为它在您的示例中仅限于24小时 - 您的实际数据中很可能只有DATETIMETIMESTAMP。在这种情况下,您还需要将所有TIME_*函数替换为相应的DATETIME_*TIMESTAMP_*函数

所以,最后 - 查询是:

#standardSQL
WITH `project.dataset.table` AS (
  SELECT TIME '00:00:00' time, 31 viewCount UNION ALL
  SELECT TIME '00:00:01', 44 UNION ALL
  SELECT TIME '00:00:02', 78 UNION ALL
  SELECT TIME '00:00:03', 71 UNION ALL
  SELECT TIME '00:00:04', 72 UNION ALL
  SELECT TIME '00:00:05', 73 UNION ALL
  SELECT TIME '00:00:06', 64 UNION ALL
  SELECT TIME '00:00:07', 70 
),
period AS (
  SELECT MIN(time) min_time, MAX(time) max_time, TIME_DIFF(MAX(time), MIN(time), SECOND) diff
  FROM `project.dataset.table`
),
checkpoints AS (
  SELECT TIME_ADD(min_time, INTERVAL step SECOND) start_time, TIME_ADD(min_time, INTERVAL step + 2 SECOND) end_time
  FROM period, UNNEST(GENERATE_ARRAY(0, diff + 2, 2)) step
)
SELECT start_time time, SUM(viewCount) viewCount
FROM checkpoints c
JOIN `project.dataset.table` t
ON t.time >= c.start_time AND t.time < c.end_time
GROUP BY start_time
ORDER BY start_time, time  

结果是:

Row time        viewCount    
1   00:00:00    75   
2   00:00:02    149  
3   00:00:04    145  
4   00:00:06    134  

答案 1 :(得分:1)

假设您实际上有一个TIMESTAMP列,您可以使用这样的方法:

#standardSQL
SELECT
  TIMESTAMP_SECONDS(
    UNIX_SECONDS(timestamp) -
    MOD(UNIX_SECONDS(timestamp), 15 * 60)
  ) AS time,
  SUM(viewCount) AS viewCount
FROM `project.dataset.table`
GROUP BY time;

它依赖于Unix秒的转换,以便计算15分钟的间隔。请注意,与Mikhail的解决方案不同,它不会产生零空间15分钟的行,但是(不清楚这对你来说是否重要)。