我正在尝试使用窗口函数来计算sku每天的销售量,以得出sku数量的最后365天的总和。如果每天都售出,那么我可以使用ROWS和PRECEDING等
ORDER BY
CalendarFullDate ROWS BETWEEN 364 PRECEDING AND CURRENT ROW
但是在这种情况下,日期没有很多天没有销售就没有平均分配(即我不能只返回364行并假设每天都有销售)。
因此,使用下面的测试/示例,是否可以使用开窗和某种类型的where子句,因此Im最多只能汇总364天?
WITH samples AS (
SELECT "1" AS SKU, DATE("2018-10-27") AS CalendarFullDate, 86.0 AS DailySalesQty UNION ALL (
SELECT "1" AS SKU, DATE("2018-10-20"), 84.0 ) UNION ALL (
SELECT "1" AS SKU, DATE("2018-09-29"), 88.0 ) UNION ALL (
SELECT "1" AS SKU, DATE("2018-09-14"), 42.0 ) UNION ALL (
SELECT "1" AS SKU, DATE("2018-09-01"), 21.0 ) UNION ALL (
SELECT "1" AS SKU, DATE("2018-05-05"), 25.0 ) UNION ALL (
SELECT "1" AS SKU, DATE("2018-04-28"), 97.0 ) UNION ALL (
SELECT "1" AS SKU, DATE("2018-03-31"), 244.0 ) UNION ALL (
SELECT "1" AS SKU, DATE("2018-03-24"), 68.0 ) UNION ALL (
SELECT "1" AS SKU, DATE("2018-02-23"), 52.0 ) UNION ALL (
SELECT "1" AS SKU, DATE("2018-02-10"), 48.0 ) UNION ALL (
SELECT "1" AS SKU, DATE("2018-01-21"), 243.0 ) UNION ALL (
SELECT "1" AS SKU, DATE("2018-01-18"), 2.0 ) UNION ALL (
SELECT "1" AS SKU, DATE("2018-01-06"), 190.0 ) UNION ALL (
SELECT "1" AS SKU, DATE("2017-12-26"), 310.0 ) UNION ALL (
SELECT "1" AS SKU, DATE("2017-12-09"), 240.0 ) UNION ALL (
SELECT "1" AS SKU, DATE("2017-11-03"), 30.0 ) UNION ALL (
SELECT "1" AS SKU, DATE("2017-10-21"), 164.0 ) UNION ALL (
SELECT "1" AS SKU, DATE("2017-09-30"), 44.0 ) UNION ALL (
SELECT "1" AS SKU, DATE("2017-09-09"), 55.0 ) UNION ALL (
SELECT "1" AS SKU, DATE("2017-09-01"), 35.0 ) UNION ALL (
SELECT "1" AS SKU, DATE("2017-05-20"), 60.0 ) UNION ALL (
SELECT "1" AS SKU, DATE("2017-05-06"), 68.0 ) UNION ALL (
SELECT "1" AS SKU, DATE("2017-04-15"), 136.0) UNION ALL (
SELECT "2" AS SKU, DATE("2018-10-24"), 46.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2018-10-18"), 56.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2018-09-16"), 19.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2018-09-02"), 42.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2018-09-01"), 45.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2018-07-05"), 25.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2018-06-28"), 210.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2018-05-31"), 44.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2018-05-24"), 168.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2018-04-23"), 152.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2018-03-10"), 8.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2018-02-21"), 23.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2018-01-18"), 20.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2018-01-06"), 10.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2017-12-26"), 30.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2017-11-09"), 1240.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2017-11-03"), 323.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2017-10-21"), 123.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2017-09-30"), 444.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2017-09-09"), 555.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2017-08-01"), 35.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2017-06-20"), 6.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2017-05-06"), 68.0 ) UNION ALL (
SELECT "2" AS SKU, DATE("2017-04-15"), 136.0) UNION ALL (
SELECT "2" AS SKU, DATE("2017-04-09"), 136.0)
)
SELECT
SKU,
CalendarFullDate,
SUM(DailySalesQty) OVER(win)
FROM
samples WINDOW win AS (
PARTITION BY
SKU
ORDER BY
CalendarFullDate
RANGE BETWEEN DATE_TRUNC(CalendarFullDate,INTERVAL 364 DAY) AND CalendarFullDate)
我知道上面您不能为RANGE做,但是它是我实际想要做的一种伪代码。我尝试了where子句,但那是不允许的。
使用窗口甚至可以做到吗?这是一种很好的清洁方法,但是不确定是否可以为窗口聚合表达这样的条件?
注意:这是真实数据的简化版本,具有5个字段作为分区,并且还聚合了20个奇怪的度量,并且是一个巨大的数据集(1 TB),因此也希望它具有更高的效率。 / p>
有想法吗?
干杯!
答案 0 :(得分:2)
以下是用于BigQuery标准SQL
#standardSQL
SELECT
SKU,
CalendarFullDate,
SUM(DailySalesQty) OVER(win) SalesQty365days
FROM (
SELECT
SKU,
CalendarFullDate,
DailySalesQty,
UNIX_DATE(CalendarFullDate) unix_days
FROM samples
)
WINDOW win AS (
PARTITION BY SKU ORDER BY unix_days
RANGE BETWEEN 364 PRECEDING AND CURRENT ROW
)
这里的窍门是将DATE类型的CalendarFullDate
字段转换为自大纪元以来的INTEGER天数,以便可以在WINDOW表达式的ORDER BY和RANGE部分中使用