Bigquery-窗口汇总同比

时间:2018-11-21 21:12:49

标签: google-bigquery

我正在尝试使用窗口函数来计算sku每天的销售量,以得出sku数量的最后365天的总和。如果每天都售出,那么我可以使用ROWS和PRECEDING等

ORDER BY
      CalendarFullDate ROWS BETWEEN 364 PRECEDING AND CURRENT ROW

但是在这种情况下,日期没有很多天没有销售就没有平均分配(即我不能只返回364行并假设每天都有销售)。

因此,使用下面的测试/示例,是否可以使用开窗和某种类型的where子句,因此Im最多只能汇总364天?

WITH samples AS (
  SELECT "1" AS SKU, DATE("2018-10-27") AS CalendarFullDate, 86.0 AS DailySalesQty UNION ALL (
  SELECT "1" AS SKU, DATE("2018-10-20"), 84.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-09-29"), 88.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-09-14"), 42.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-09-01"), 21.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-05-05"), 25.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-04-28"), 97.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-03-31"), 244.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-03-24"), 68.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-02-23"), 52.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-02-10"), 48.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-01-21"), 243.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-01-18"), 2.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-01-06"), 190.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-12-26"), 310.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-12-09"), 240.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-11-03"), 30.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-10-21"), 164.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-09-30"), 44.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-09-09"), 55.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-09-01"), 35.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-05-20"), 60.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-05-06"), 68.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-04-15"), 136.0) UNION ALL (

  SELECT "2" AS SKU, DATE("2018-10-24"), 46.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-10-18"), 56.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-09-16"), 19.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-09-02"), 42.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-09-01"), 45.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-07-05"), 25.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-06-28"), 210.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-05-31"), 44.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-05-24"), 168.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-04-23"), 152.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-03-10"), 8.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-02-21"), 23.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-01-18"), 20.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-01-06"), 10.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-12-26"), 30.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-11-09"), 1240.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-11-03"), 323.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-10-21"), 123.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-09-30"), 444.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-09-09"), 555.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-08-01"), 35.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-06-20"), 6.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-05-06"), 68.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-04-15"), 136.0) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-04-09"), 136.0)
)

SELECT 
  SKU, 
  CalendarFullDate, 
  SUM(DailySalesQty) OVER(win)
FROM
  samples WINDOW win AS (
    PARTITION BY
      SKU
    ORDER BY
      CalendarFullDate 
    RANGE BETWEEN DATE_TRUNC(CalendarFullDate,INTERVAL 364 DAY) AND CalendarFullDate)

我知道上面您不能为RANGE做,但是它是我实际想要做的一种伪代码。我尝试了where子句,但那是不允许的。

使用窗口甚至可以做到吗?这是一种很好的清洁方法,但是不确定是否可以为窗口聚合表达这样的条件?

注意:这是真实数据的简化版本,具有5个字段作为分区,并且还聚合了20个奇怪的度量,并且是一个巨大的数据集(1 TB),因此也希望它具有更高的效率。 / p>

有想法吗?

干杯!

1 个答案:

答案 0 :(得分:2)

以下是用于BigQuery标准SQL

#standardSQL
SELECT 
    SKU, 
    CalendarFullDate,
    SUM(DailySalesQty) OVER(win) SalesQty365days
FROM (
  SELECT 
    SKU, 
    CalendarFullDate, 
    DailySalesQty,
    UNIX_DATE(CalendarFullDate) unix_days
  FROM samples 
)
WINDOW win AS (
  PARTITION BY SKU ORDER BY unix_days 
  RANGE BETWEEN 364 PRECEDING AND CURRENT ROW
)

这里的窍门是将DATE类型的CalendarFullDate字段转换为自大纪元以来的INTEGER天数,以便可以在WINDOW表达式的ORDER BY和RANGE部分中使用