有没有办法在Big Query中进行滚动平均?

时间:2014-03-06 23:05:07

标签: google-bigquery

我知道Big Query中有一个AVG函数,并且有一些窗口函数可以将上一个或下一个值向上或向下移动一行,但是有没有允许您在指定时间间隔内平均的函数?例如,我想要如下所示:

SELECT
    city
    AVG(temperature) OVER(PARTITION BY city, INTERVAL day,14, ORDER BY day) as rolling_avg_14_days,
    AVG(temperature) OVER(PARTITION BY city, INTERVAL day,30, ORDER BY day) as rolling_avg_30_days,
WHERE
    city IN ("Los Angeles","Chicago","Sun Prairie","Sunnyvale")
    AND year BETWEEN 1900 AND 2013

我想进行滚动平均计算,允许我指定一系列值来执行聚合功能,以及要按什么值排序。平均函数将采用当前温度和之前的13天(或之前的29天)来计算和平均。今天有可能吗?我知道如果我在SELECT语句中放入13个LAG / OVER字段然后平均所有这些字段的结果,我可以做这样的事情,但这是很多开销。

2 个答案:

答案 0 :(得分:9)

我认为OVER with RANGE的{​​{1}}构造最适合

假设day字段表示为'YYYY-MM-DD'格式,则下面的查询执行滚动平均值

SELECT
  city,
  day,
  AVG(temperature) OVER(PARTITION BY city ORDER BY ts 
                RANGE BETWEEN 14*24*3600 PRECEDING AND CURRENT ROW) AS rolling_avg_14_days,
  AVG(temperature) OVER(PARTITION BY city ORDER BY ts 
                RANGE BETWEEN 30*24*3600 PRECEDING AND CURRENT ROW) AS rolling_avg_30_days
FROM (
  SELECT day, city, temperature, TIMESTAMP_TO_SEC(TIMESTAMP(day)) AS ts 
  FROM temperatures
)

你很可能很久以前就已经找到了这个解决方案,但是仍然想在这里为这个问题提供我认为更好的答案(截至今天)

答案 1 :(得分:0)

JOIN EACH的另一个选项(由于可以在中间步骤中生成极大量的数据,因此速度太慢):

SELECT a.SensorId SensorId, a.Timestamp, AVG(b.Data) AS avg_prev_hour_load
FROM (
  SELECT * FROM [io_sensor_data.moscone_io13]
  WHERE SensorId = 'XBee_40670EB0/mic') a
JOIN EACH [io_sensor_data.moscone_io13] b
ON a.SensorId = b.SensorId
WHERE b.Timestamp BETWEEN (a.Timestamp - 36000000) AND a.Timestamp
GROUP BY SensorId, a.Timestamp;

(基于Joe Celko的SQL问题)

对于窗口函数,让一个实现更大的范围可能很有用,但是现在我会自动生成查询。