我将使用一个假设的示例来说明我连接两个已经停留了很长时间的表的问题:
每个订单有很多交货日期和数量:
订单:
OrderID DeliveryID DeliveryDate Quantity
======= ========== ============ ========
1000 001 2017-01-01 10
1000 002 2017-01-08 10
1000 003 2017-01-15 10
1001 001 .... 10
已收货被映射到一个OrderID,而不是DeliveryID:
已收到:
OrderID InvoiceID ReceivedDate ReceivedQuantity
======= ========= ============ ================
1000 1001 2017-01-01 10
1000 1002 2017-01-09 09
1000 1003 2017-01-10 01
1000 1004 2017-01_15 10
我现在正尝试将Received表连接到Order表。但是,由于例如第二个订单是通过两个单独的步骤接收的,因此我想将该订单加入到后者的接收行中,因为只有那时我的订单才完全到达。
通过正常通过OrderID加入,我们将收到3个订单和4个收货的上层示例,总共有12个加入项目。
所以我想通过累积每个订单ID和每个与订单ID相关的交货数量来解决此问题,我的调整表如下所示:
订购
OrderID DeliveryID DeliveryDate Quantity OrderedAccumQuant
======= ========== ============ ======== ================
1000 001 2017-01-01 10 10
1000 002 2017-01-08 10 20
1000 003 2017-01-15 10 30
1001 001 .... 10 10
已收到:
OrderID InvoiceID ReceivedDate ReceivedQuantity AccumQuant
======= ========= ============ ================ ================
1000 1001 2017-01-01 10 10
1000 1002 2017-01-09 09 19
1000 1003 2017-01-10 01 20
1000 1004 2017-01_15 10 30
我现在的计划逻辑是使用第一个Received行将ReceivedAccumQuant等于或大于OrderedAccumQuant,将Received表连接到Order表。在此示例中,未合并InvoiceID 1002,因为其交付的数量不是订购的数量。
所需的输出:
Ordered Rec
OrderID DelivID DeliveryDate Quant AccQuant RecDate InvoiceID AccQu
======= ======= ============ ===== ======= ======= ========= =====
1000 001 2017-01-01 10 10 .-01-01 1001 10
1000 002 2017-01-08 10 20 .-01-10 1003 20
1000 003 2017-01-15 10 30 .-01-15 1004 30
1001 001 ....
所需的输出将在每个OrderID中显示:从所有DeliveryID到一个InvoiceID的匹配,其中Received_accumulated_quantity> =比ordered_accumulated_quantity
我的两种方法:
内存分配问题
一种方法是通过子查询对Joined表进行内部联接,然后在按OrderID和DeliveryID进行分区并且还包括Delivery.Cum_Quant> = Order.Cum_Quant的WHERE条件之后对联接的表进行rank()。在最外层WHERE条件为CUM_RANK = 1的情况下,我们仅对满足条件的每个DeliveryID的第一个条目进行过滤。
此解决方案对于我的数据集的一小部分效果很好,但是由于内存分配,一旦使用完整的数据集,此解决方案就会崩溃,因为许多接收到的商品在被CUM_RANK筛选为1之前已被加入订单商品中。随着大约500万个计划交付日期和500万个接收日期以及每个OrderID多达100个交付,联接表变得非常大:
SELECT
*
,RANK() OVER(PARTITION BY Received.OrderID, Received.DeliveryID ORDER BY Received.CUM_QUANT) as CUM_RANK
FROM Orders
JOIN
(
SELECT
*
,RANK() OVER(PARTITION BY Received.OrderID ORDER BY ReceivedDate) AS Rank
,SUM(QUANTITY) OVER(PARTITION BY Received.OrderID ORDER BY ReceivedDate) AS Cum_Quant
FROM Received
)
ON Orders.OrderID = Delivery.OrderID
WHERE
Received.Cum_Quant >= Order.Cum_Quant
ORDER BY Orders.OrderID, Received.Cum_Quant
)
WHERE CUM_RANK = 1;
我的错误消息如下: SAP DBTech JDBC:[4]:无法分配足够的内存:请检查跟踪以获取更多信息
关于如何解决这个问题的任何想法?
访问主表问题
我的另一个猜测是在JOIN语句的SELECT子查询中进行Received.AccumQuant和Orders.AccumQuant的比较,从而避免执行许多联接。但是从JOIN语句中,您无法访问Orders表:
SELECT *
FROM Orders
JOIN (
SELECT * FROM (
SELECT
*
,ROW_NUMBER() OVER(PARTITION BY OrderID ORDER BY ReceivedDate ASC) AS RowNumb
FROM Delivery
WHERE
WHERE Orders.OrderID = Received.OrderID
AND Received.AccumQuant >= Orders.AccumQuant --(this point doesnt worke since we cannot access the Orders table from here)
) AS DeliveryRanked
ON Orders.OrderID = Received.OrderID
答案 0 :(得分:0)
您可以尝试一下,但是我不确定我是否完全理解您的要求。
-flto