计算bigquery中的7、14和30天移动平均线

时间:2019-02-28 08:04:25

标签: google-bigquery

我正在玩bigquery。我将IoT正常运行时间记录作为输入:

+---------------+-------------+----------+------------+
|   device_id   |  reference  |  uptime  | timestamp  |
+---------------+-------------+----------+------------+
| 1             | 1000-5      |  0.7     | 2019-02-12 |
| 2             | 1000-6      |  0.9     | 2019-02-12 |
| 1             | 1000-5      |  0.8     | 2019-02-11 |
| 2             | 1000-6      |  0.95    | 2019-02-11 |
+---------------+-------------+----------+------------+

我想计算按设备分组的正常运行时间的7、14和30天移动平均值。输出应如下所示:

+---------------+-------------+---------+--------+--------+
|   device_id   |  reference  |  avg_7  | avg_14 | avg_30 |
+---------------+-------------+---------+--------+--------+
| 1             | 1000-5      |  0.7    | ..     | ..     |
| 2             | 1000-6      |  0.9    | ..     | ..     |
+---------------+-------------+---------+--------+--------+

我尝试过的事情:

SELECT
    device_id,
    AVG(uptime) OVER (ORDER BY day RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) AS avg_7d
FROM (
  SELECT device_id, uptime, UNIX_DATE(DATE(timestamp)) as day FROM `uptime_recordings`
)
GROUP BY device_id, uptime, day

我有1000个不同设备的录音和200k读数。分组不起作用,查询返回200k条记录,而不是1000条。有什么想法吗?

1 个答案:

答案 0 :(得分:1)

  

我有1000个不同设备的录音和200k读数。分组不起作用,查询返回200k条记录,而不是1000条。有什么想法吗?

执行GROUP BY device_id, uptime, day而不是GROUP BY device_id, day

完整的查询:

WITH data 
AS (
  SELECT title device_id, views uptime, datehour timestamp
  FROM `fh-bigquery.wikipedia_v3.pageviews_2019` 
  WHERE DATE(datehour) BETWEEN '2019-01-01' AND '2019-01-09'
  AND wiki='br'
  AND title='Chile'
)

SELECT device_id, day
  , AVG(uptime) OVER (PARTITION BY device_id ORDER BY UNIX_DATE(day) RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) AS avg_7d
FROM (
  SELECT device_id, AVG(uptime) uptime, (DATE(timestamp)) as day
  FROM `data`
  GROUP BY device_id, day
)

enter image description here

编辑:根据评论中的要求,不确定汇总所有7d平均值的目标是什么

WITH data 
AS (
  SELECT title device_id, views uptime, datehour timestamp
  FROM `fh-bigquery.wikipedia_v3.pageviews_2019` 
  WHERE DATE(datehour) BETWEEN '2019-01-01' AND '2019-01-09'
  AND wiki='br'
  AND title IN ('Chile', 'Saozneg')
)

SELECT device_id, AVG(avg_7d) avg_avg_7d
FROM (
  SELECT device_id, day
    , AVG(uptime) OVER (PARTITION BY device_id ORDER BY UNIX_DATE(day) RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) AS avg_7d
  FROM (
    SELECT device_id, AVG(uptime) uptime, (DATE(timestamp)) as day
    FROM `data`
    GROUP BY device_id, day
  )
)
GROUP BY device_id 

enter image description here