如何在Big Query中计算与当前行相同工作日的滚动平均值?

时间:2018-04-18 13:10:30

标签: google-bigquery

由于某些限制,我必须使用旧版SQL。

现在,我有这是过去21天(相当于3周)的滚动总和,但我真的在寻找一种方法来对应于同一工作日的3个工作日的滚动总和当前行。

AVG(sales_total) OVER (PARTITION BY id ORDER BY date RANGE BETWEEN 22 PRECEDING AND 1 PRECEDING) AS avg_of_last_3_week

编辑:

表A

+-------+---------+---------+-----------+
| id .  |  date   | weekday |sales_total|
+-------+---------+---------+-----------+
| 1     | 01-01-17|    1    |     5     |
| 2     | 01-02-17|    2    |     .     |
| 3     | 01-03-17|    3    |     .     |
| 1     | 01-08-17|    1    |     10    |
| 2     | 01-09-17|    2    |     .     |
| 3     | 01-10-17|    3    |     .     |
| 1     | 01-15-17|    1    |     15    |
| 2     | 01-16-17|    2    |     .     |
| 3     | 01-17-17|    3    |     .     |
+-------+---------+---------+-----------+

我希望生成的查询返回表A,其中包含一个额外的列,即滚动平均值(例如,下面的行将是我期望的01-22-17上的id 1)。滚动平均值仅为前3个星期日(5 + 10 + 15)的平均值

+-------+---------+---------+-----------+-----------+
| id    |  date   | weekday |sales_total|rolling_avg|
+-------+---------+---------+-----------+-----------+
| 1     | 01-22-17|    1    |     15    |    10     |

谢谢

1 个答案:

答案 0 :(得分:2)

下面的示例适用于BigQuery Standard SQL(如果您仍然使用Legacy SQL,则可以轻松地“将下面的内容翻译为Legacy”)

   
#standardSQL
SELECT id, sales_date, weekday, sales_total, 
  AVG(sales_total) OVER(rolling_3_previous_same_weekdays) rolling_avg
FROM (
  SELECT *, EXTRACT(DAYOFWEEK FROM sales_date) weekday
  FROM t
)
WINDOW rolling_3_previous_same_weekdays AS (
  PARTITION BY id, weekday 
  ORDER BY sales_date 
  ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  )
ORDER BY weekday, sales_date

您可以使用虚拟数据进行测试/播放,如下所示

#standardSQL
WITH t AS (
  SELECT 1 id, 1 AS sales_total, DATE '2017-01-01' sales_date UNION ALL
  SELECT 1,  2, DATE '2017-01-02' UNION ALL
  SELECT 1,  3, DATE '2017-01-03' UNION ALL
  SELECT 1,  4, DATE '2017-01-04' UNION ALL
  SELECT 1,  5, DATE '2017-01-05' UNION ALL
  SELECT 1,  6, DATE '2017-01-06' UNION ALL
  SELECT 1,  7, DATE '2017-01-07' UNION ALL
  SELECT 1,  8, DATE '2017-01-08' UNION ALL
  SELECT 1,  9, DATE '2017-01-09' UNION ALL
  SELECT 1, 10, DATE '2017-01-10' UNION ALL
  SELECT 1, 11, DATE '2017-01-11' UNION ALL
  SELECT 1, 12, DATE '2017-01-12' UNION ALL
  SELECT 1, 13, DATE '2017-01-13' UNION ALL
  SELECT 1, 14, DATE '2017-01-14' UNION ALL
  SELECT 1, 15, DATE '2017-01-15' UNION ALL
  SELECT 1, 16, DATE '2017-01-16' UNION ALL
  SELECT 1, 17, DATE '2017-01-17' UNION ALL
  SELECT 1, 18, DATE '2017-01-18' UNION ALL
  SELECT 1, 19, DATE '2017-01-19' UNION ALL
  SELECT 1, 20, DATE '2017-01-20' UNION ALL
  SELECT 1, 21, DATE '2017-01-21' UNION ALL
  SELECT 1, 22, DATE '2017-01-22' UNION ALL
  SELECT 1, 23, DATE '2017-01-23' UNION ALL
  SELECT 1, 24, DATE '2017-01-24' UNION ALL
  SELECT 1, 25, DATE '2017-01-25' UNION ALL
  SELECT 1, 26, DATE '2017-01-26' UNION ALL
  SELECT 1, 27, DATE '2017-01-27' UNION ALL
  SELECT 1, 28, DATE '2017-01-28' UNION ALL
  SELECT 1, 29, DATE '2017-01-29' UNION ALL
  SELECT 1, 30, DATE '2017-01-30' UNION ALL
  SELECT 1, 31, DATE '2017-01-31' 
)
SELECT id, sales_date, weekday, sales_total, 
  AVG(sales_total) OVER(rolling_3_previous_same_weekdays) rolling_avg
FROM (
  SELECT *, EXTRACT(DAYOFWEEK FROM sales_date) weekday
  FROM `project.dataset.your_table
)
WINDOW rolling_3_previous_same_weekdays AS (
  PARTITION BY id, weekday 
  ORDER BY sales_date 
  ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  )
ORDER BY weekday, sales_date

我希望一旦你了解上述方法 - 你可以使用BigQuery Legacy SQL轻松地重现它 - 这里使用的唯一一个特定于Standrad SQL的函数是EXTRACT() - 但看起来你甚至不需要它作为工作日已经是你数据的一部分

祝你好运! :O)