我正在尝试在循环内更新表,这花费了太长时间。需要帮助以提高效率吗?
有关问题和所采用方法的一些背景知识- 我有下表,
Gift_Earned_Used :
customer earned_id earned_day earned_type used_id used_day used_type
6832 1234 '01-JAN-19' Free Pizza null null null
6832 1771 '03-JAN-19' Free Pizza null null null
6506 1901 '07-JAN-19' Free Coffee null null null
该表当前有3,300万行,其中used_id
,used_day
和used_type
为空。该表包含所有赢得了任何类型礼物(免费比萨饼,免费咖啡,免费面包)的客户,以及相应的交易ID(earned_id
)和交易日(earned_day
)。
另一个表
已使用的礼物:
customer used_id used_day used_type ear_pos_earned_day
6832 1339 '31-DEC-18' Free Pizza '02-DEC-18'
6832 1821 '03-JAN-19' Free Pizza '04-DEC-18'
6506 2454 '07-JAN-19' Free Coffee '08-JAN-19'
当前有1900万行。
问题在于,当客户使用礼物时,无法将特定的二手礼物与所赚取的礼物联系起来。 earned_id
和used_id
仅仅是交易ID。为了做到这一点,我们假设采用先进先出的方法。
在这种情况下,假定第一个使用过的礼物将与客户和礼物类型上第一个获得的礼物匹配相关。另外,还需要确保used_day不小于Earned_day(如果您尚未获得礼物,就不能使用礼物)。更具体地说,赚取的天数必须在ear_pos_earned_day
和used_day
之间。
为此,我遍历 Gift_Used 表以更新 Gift_Earned_Used 表中存在匹配项的空值,以使我的表 Gift_Earned_Used 更新后的样子:
customer earned_id earned_day earned_type used_id used_day used_type
6832 1234 '01-JAN-19' Free Pizza 1821 '03-JAN-19' Free Pizza
6832 1771 '03-JAN-19' Free Pizza null null null
6506 1901 '07-JAN-19' Free Coffee 2454 '07-JAN-19' Free Coffee
我考虑了几个用例,并且能够通过代码实现想要的目标。
DECLARE
var_earned_id NUMBER;
--looping through all the customers in the gift_used table
--and ordering it by used_day, used_id such that if there
--are two used gifts of the same type, the one with the lesser
--transaction id gets assigned first
BEGIN
FOR v_used IN
(
SELECT /*+PARALLEL(8)*/
Customer
,Used_Type
,Used_Id
,Used_Day
,ear_pos_earned_day
FROM
gift_used
ORDER BY
Customer,Used_Day,Used_Id
)
LOOP
BEGIN
--this is the part where i am getting the earned_id that matches
--the criteria. If more than one earned_id matches the criteria
--, the top one is picked (one with lesser transaction id)
SELECT Earned_Id INTO Var_Earned_Id FROM
(
SELECT Earned_Id FROM gift_earned_used
WHERE 1=1
AND Customer = v_used.Customer
AND Earned_Type = v_used.Used_Type
AND Used_Id IS NULL
AND Earned_Day BETWEEN v_used.ear_pos_earned_day AND v_used.used_day ORDER BY Earned_Day,Earned_Id
)
WHERE ROWNUM=1
;
--for the earned_id picked above that matched the criteria
--the values in the used_id and used_day are updated from loop
UPDATE /*+PARALLEL(8)*/ gift_earned_used u
SET u.used_id = v_used.Used_Id
,u.used_day = v_used.used_day
WHERE 1=1
AND u.earned_id = Var_Earned_Id
;
EXCEPTION
WHEN NO_DATA_FOUND THEN
Var_Earned_Id := 0;
END;
END LOOP;
COMMIT;
END;
如上所述,我能够实现所需的输出。我尝试了几种方法来实现,但是在逻辑上只能使用循环结构来实现。
我在小型数据集上尝试过,它似乎工作正常。但是,当我对整个数据集执行此操作时,gift_earned_used中的3300万行要从存在匹配项的gift_used(1900万行)中进行更新-它永远不会停止。需要太长时间。
我真的需要有关如何改进它,使其更有效的建议。
答案 0 :(得分:1)
这是问题的原始版本。
您可以编写查询以通过对行进行交织并使用窗口函数来获取每赚取的used_id
。
这个想法是使用每个客户/类型的累积/兑换的累积计数来分配分组,然后使用该计数来分配used_id
。这很棘手,因为累计计数是忽略当前行进行兑换的总和(需要与最新赚取的值相关联)。
with eu as (
select earned_id, customer, earned_date as date, earned_type as type, null as used_id, 1 as earned
from gift_earned_used geu
union all
select null, customer, used_date as date, used_type as type, used_id, -1 as earned
from gift_used geu
),
eu2 as (
select eu.*,
(sum(earned) over (partition by customer, type
order by date
) -
greatest(earned, 0) -- ignore current row for redemptions
) earned_grouping
from eu
)
select eu2.*
from (select eu2.*,
lead(used_id ignore nulls) over (partition by customer, type, earned_grouping order by date) as new_used_id
from eu2
) eu2
where used_id is null; -- only select the earned rows
当您确认这可行时,您有两种方法:
merge
更新原始表。我将使用第二种方法,因为更新表中的每一行实际上可能会非常昂贵。