我(想)我有一个有趣的窗口场景,与网站库存可用性跟踪有关,可以在标准SQL方面完成。这是为了建立一个股票头寸视图,基于从手头推送和拉出的事件。给定产品的库存量
我们有3个与此问题相关的事件类型:
StocklevelUpdated (PUSH):每天午夜,我们都会从仓库获取有关给定产品的onHandQty可用性级别的最新更新。这基本上是一个艰难的重置'如果onHandQty计算每个产品作为第二天交易的新值。 (注意:即使没有变化,这实际上每晚都会发送一条记录)。
订单接受(PULL):然后在一天中,有许多订单接受'产品事件,在这种情况下,股票的负值为onHandQtyDelta' (现在少卖)。 -2订购2件产品。
OrderCancelled :产品的数量也可以存在,这对于onHandQtyDelta'具有正面价值,因为它会重新加入可用的待售库存中)
下面是我想要实现的的时间顺序的数据略微简化版本的tabluar视图(注意:这显示了单个产品,但当然有很多)。
onHandQtyDelta - 由于此事件而对onHandQty的更改
onHandQty - 这是发布 delta影响的时间点的净正数。
现在虽然上面的图片显示了很好地分出的所有值(虽然,注释29是这些午夜重置之一),但实际上,并非所有这些数据都可用并且其中一个值需要为每个订单事件类型派生。即1缺失。
onHandQty :实际数据集中唯一定义了绝对onHandQty的行是' StocklevelUpdated'事件。从本质上讲,这将重置'午夜产品的这个值(例如29)。日志记录需要基本上追溯到最接近的这些。但是,需要派生onHandQtyDelta。
onHandQtyDelta 只有OrderAccepts和OrderCancelled事件具有此值,需要使用该值来计算onHandQty。
所以一张图片说了千言万语,所以数据的实际情况如下所示:
如何高效地完成这项工作(假设存在数百万行)?
我的想法是使用窗口和延迟'函数回顾之前的记录onHandQty值以查看它是什么,然后执行加法或减法以得出新的onHandQty值。
问题在于是一个递归问题,因为前一个事件本身需要回顾它的先前等等......直到你得到一个stocklevelUpdated事件,因为这是唯一具有实际值的事件然后向前工作。但是当你不知道返回去获得这样的事件时,如何使用窗口来做到这一点 - 可以是任意数量的OrderAccepts和Cancells介于两者之间(或者没有!)
也许有些聪明的数组,将给定的产品行收集到数组中并执行一些数组聚合函数?
我认为我已经开始思考窗口是一种方式,并且可能被一个简单的解决方案蒙上阴影!对于所有细节感到抱歉,但我不想对我需要帮助的内容含糊其辞。
下面给出了开始测试数据集的工作(我只是按照产品和时间对其进行分类,以便创建图像)
WITH stock_changes AS (
SELECT
"StocklevelUpdated" AS eventName,
Timestamp("2017-06-29T23:59:59") AS stockLevelEventAt,
"PRODUCT_190035001612" AS productId,
null AS onHandQtyDelta,
23 AS onHandQty
UNION ALL (
SELECT
"StocklevelUpdated" AS eventName,
Timestamp("2017-06-29T23:59:59") AS stockLevelEventAt,
"PRODUCT_4545423454545" AS productId,
null AS onHandQtyDelta,
120 AS onHandQty)
UNION ALL (
SELECT
"OrderAccepted" AS eventName,
Timestamp("2017-06-30T01:02:20") AS stockLevelEventAt,
"PRODUCT_190035001612" AS productId,
-2 AS onHandQtyDelta,
null AS onHandQty)
UNION ALL (
SELECT
"OrderAccepted" AS eventName,
Timestamp("2017-06-30T02:19:20") AS stockLevelEventAt,
"PRODUCT_190035001612" AS productId,
-3 AS onHandQtyDelta,
null AS onHandQty)
UNION ALL (
SELECT
"OrderAccepted" AS eventName,
Timestamp("2017-06-30T05:13:20") AS stockLevelEventAt,
"PRODUCT_4545423454545" AS productId,
-3 AS onHandQtyDelta,
null AS onHandQty)
UNION ALL (
SELECT
"OrderCancelled" AS eventName,
Timestamp("2017-06-30T13:02:20") AS stockLevelEventAt,
"PRODUCT_190035001612" AS productId,
+2 AS onHandQtyDelta,
null AS onHandQty)
UNION ALL (
SELECT
"OrderCancelled" AS eventName,
Timestamp("2017-06-30T11:02:20") AS stockLevelEventAt,
"PRODUCT_4545423454545" AS productId,
2 AS onHandQtyDelta,
null AS onHandQty)
UNION ALL (
SELECT
"StocklevelUpdated" AS eventName,
Timestamp("2017-06-30T23:59:59") AS stockLevelEventAt,
"PRODUCT_190035001612" AS productId,
null AS onHandQtyDelta,
29 AS onHandQty)
UNION ALL (
SELECT
"StocklevelUpdated" AS eventName,
Timestamp("2017-06-30T23:59:59") AS stockLevelEventAt,
"PRODUCT_4545423454545" AS productId,
null AS onHandQtyDelta,
140 AS onHandQty)
)
SELECT *
FROM stock_changes
order by productId, stockLevelEventAt ASC
答案 0 :(得分:1)
以下是BigQuery Standard SQL
#standardSQL
WITH stock_changes AS (
SELECT "StocklevelUpdated" AS eventName, TIMESTAMP("2017-06-29T23:59:59") AS stockLevelEventAt,
"PRODUCT_190035001612" AS productId, NULL AS onHandQtyDelta, 23 AS onHandQty UNION ALL
SELECT "StocklevelUpdated", TIMESTAMP("2017-06-29T23:59:59"),"PRODUCT_4545423454545",NULL, 120 UNION ALL
SELECT "OrderAccepted", TIMESTAMP("2017-06-30T01:02:20"),"PRODUCT_190035001612",-2, NULL UNION ALL
SELECT "OrderAccepted", TIMESTAMP("2017-06-30T02:19:20"),"PRODUCT_190035001612",-3, NULL UNION ALL
SELECT "OrderAccepted", TIMESTAMP("2017-06-30T05:13:20"),"PRODUCT_4545423454545",-3, NULL UNION ALL
SELECT "OrderCancelled", TIMESTAMP("2017-06-30T13:02:20"),"PRODUCT_190035001612",+2, NULL UNION ALL
SELECT "OrderCancelled", TIMESTAMP("2017-06-30T11:02:20"),"PRODUCT_4545423454545",2, NULL UNION ALL
SELECT "StocklevelUpdated", TIMESTAMP("2017-06-30T23:59:59"),"PRODUCT_190035001612",NULL, 29 UNION ALL
SELECT "StocklevelUpdated", TIMESTAMP("2017-06-30T23:59:59"),"PRODUCT_4545423454545",NULL, 140
)
SELECT
eventName, stockLevelEventAt, productId,
delta AS onHandQtyDelta, IFNULL(onHandQty, onHand ) AS onHandQty
FROM (
SELECT *,
SUM(IFNULL(onHandQty,0) - delta)
OVER(PARTITION BY productId, format_timestamp('%Y-%m-%d', stockLevelEventAt)
ORDER BY stockLevelEventAt DESC
rows BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING ) AS onHand
FROM (
SELECT eventName, stockLevelEventAt, productId, onHandQty,
CASE
WHEN prev IS NULL THEN IFNULL(onHandQtyDelta, 0)
ELSE onHandQty - prev - delta
END AS delta
FROM (
SELECT *,
SUM(IFNULL(onHandQtyDelta,0)) OVER(PARTITION BY productId, format_timestamp('%Y-%m-%d', stockLevelEventAt) ORDER BY stockLevelEventAt) AS delta,
LAG(onHandQty) OVER(PARTITION BY productId, eventName ORDER BY stockLevelEventAt) AS prev
FROM stock_changes
)
)
)
ORDER BY productId, stockLevelEventAt ASC
结果如下
Row eventName stockLevelEventAt productId onHandQtyDelta onHandQty
1 StocklevelUpdated 2017-06-29 23:59:59 UTC PRODUCT_190035001612 0 23
2 OrderAccepted 2017-06-30 01:02:20 UTC PRODUCT_190035001612 -2 21
3 OrderAccepted 2017-06-30 02:19:20 UTC PRODUCT_190035001612 -3 18
4 OrderCancelled 2017-06-30 13:02:20 UTC PRODUCT_190035001612 2 20
5 StocklevelUpdated 2017-06-30 23:59:59 UTC PRODUCT_190035001612 9 29
6 StocklevelUpdated 2017-06-29 23:59:59 UTC PRODUCT_4545423454545 0 120
7 OrderAccepted 2017-06-30 05:13:20 UTC PRODUCT_4545423454545 -3 117
8 OrderCancelled 2017-06-30 11:02:20 UTC PRODUCT_4545423454545 2 119
9 StocklevelUpdated 2017-06-30 23:59:59 UTC PRODUCT_4545423454545 21 140
最有可能进一步优化 - 但我更专注于实现逻辑而不是优化