我想出了以下查询来计算每天的库存余额。查询工作并给出了预期的结果,但是在事务表的子集上运行大约需要200秒,大约有2个行。 作为bigquery的新手,我想知道是否有更好/更有效的方法来做到这一点?
下面是包含一些示例数据的代码。 提前感谢任何想法或提示。
#### Generate a continuous date range
WITH days AS
(
SELECT day
FROM UNNEST(
GENERATE_DATE_ARRAY(DATE('2011-01-01'), CURRENT_DATE(), INTERVAL 1 DAY)) AS day
),
#### Transactional information of inventory movements. Simple example
movements AS
(
SELECT 1 AS ItemID
,1 AS Location
,DATE('2017-12-01') AS TransactionDate
,0 AS Quantity
UNION ALL SELECT 1, 1, DATE('2017-12-03'), 10
UNION ALL SELECT 1, 1, DATE('2017-12-06'), 100
UNION ALL SELECT 1, 1, DATE('2017-12-12'), 1000
),
#### Calculate cumulative sum for each item and location based on the transaction date
cumsum AS
(
SELECT ItemID
,TransactionDate
,Location
,SUM(Quantity) OVER (PARTITION BY ItemID, Location ORDER BY TransactionDate ROWS UNBOUNDED PRECEDING) as cumulative_quantity
FROM movements
),
#### Cross join with the date range to backfill cumulative values for each day
#### This will return multiple lines for a day when there are multiple transaction date balances
cross_sum AS
(
SELECT m.ItemID
,m.Location
,d.day
,m.TransactionDate
,m.cumulative_quantity
FROM days d
CROSS JOIN cumsum m
WHERE m.TransactionDate <= d.day
),
#### Get just one line per day, based on the latest transaction date
filtered AS
(
SELECT ItemID
,Location
,CAST (day AS datetime) AS BalanceDate
,ARRAY_AGG(cumulative_quantity ORDER BY TransactionDate DESC LIMIT 1) AS InventoryBalance
FROM cross_sum
GROUP BY 1,2,3
)
#### Final result, flattened out
SELECT ItemID
,Location
,BalanceDate
,(SELECT SUM(InventoryBalance) FROM UNNEST(InventoryBalance) AS InventoryBalance) AS InventoryBalance
FROM filtered
ORDER BY 1,2,3
答案 0 :(得分:2)
我想知道是否有更好/更有效的方法来做到这一点?
以下是BigQuery Standard SQL
正如您所看到的那样:days
,cumsum
和cross_sum
被修改/优化,剩下的就被淘汰了。它具有更高的效率,但需要在实际数据上进行测试 - 因此您应该尝试查看它是否
#standardSQL
#### Transactional information of inventory movements. Simple example
WITH movements AS (
SELECT 1 AS ItemID, 1 AS Location, DATE('2017-12-01') AS TransactionDate, 0 AS Quantity UNION ALL
SELECT 1, 1, DATE('2017-12-03'), 10 UNION ALL
SELECT 1, 1, DATE('2017-12-06'), 100 UNION ALL
SELECT 1, 1, DATE('2017-12-12'), 1000
), days AS (
SELECT day, ItemID, Location
FROM UNNEST(GENERATE_DATE_ARRAY((SELECT MIN(TransactionDate) AS d FROM movements), CURRENT_DATE(), INTERVAL 1 DAY)) AS day
CROSS JOIN (SELECT DISTINCT ItemID, Location FROM movements)
), cumsum AS (
SELECT ItemID
,TransactionDate
,Location
,LEAD(TransactionDate) OVER(PARTITION BY ItemID, Location ORDER BY TransactionDate) AS NextTransactionDate
,SUM(Quantity) OVER(PARTITION BY ItemID, Location ORDER BY TransactionDate ROWS UNBOUNDED PRECEDING) AS cumulative_quantity
FROM movements
), cross_sum AS (
SELECT d.ItemID
,d.Location
,d.day AS BalanceDate
,m.cumulative_quantity
FROM days d
JOIN cumsum m
ON d.day >= IFNULL(m.TransactionDate, d.day)
AND d.day < IFNULL(m.NextTransactionDate, CURRENT_DATE())
)
SELECT ItemID
,Location
,BalanceDate
,cumulative_quantity
FROM cross_sum
ORDER BY 1,2,3
结果是
ItemID Location BalanceDate cumulative_quantity
1 1 2017-12-01 0
1 1 2017-12-02 0
1 1 2017-12-03 10
1 1 2017-12-04 10
1 1 2017-12-05 10
1 1 2017-12-06 110
1 1 2017-12-07 110
1 1 2017-12-08 110
1 1 2017-12-09 110
1 1 2017-12-10 110
1 1 2017-12-11 110
1 1 2017-12-12 1110
1 1 2017-12-13 1110
1 1 2017-12-14 1110
1 1 2017-12-15 1110