我有一张表,其中有1220200条重复记录。
我正在使用以下查询删除重复记录。
DELETE /*+ NO_CPU_COSTING */
FROM FCST f1
WHERE
ROWID >
(SELECT MIN (ROWID)
FROM FCST f2
WHERE
f1.DMDUNIT = f2.DMDUNIT
AND f1.DMDGROUP = f2.DMDGROUP
AND f1.LOC = f2.LOC
AND f1.STARTDATE = f2.STARTDATE
AND f1.TYPE = f2.TYPE
AND UPPER (f1.FCSTID) = UPPER (f2.FCSTID));
删除这些记录需要将近2分钟。我尝试了批量删除方法,方法是将重复数据加载到游标中并将其大量删除,但这需要更多时间。
优化此代码的最佳方法是什么?
答案 0 :(得分:0)
一件简单的事情就是这样
delete /*+RULE*/ from t
where rowid in ( select rid
from ( select rowid rid,
row_number() over
(partition by cust_seg_nbr order by rowid) rn
from t
)
where rn <> 1 );
但是如果你有大量数据那么
检查此链接http://www.rampant-books.com/t_stoever_delete_duplicates.htm或使用以下代码
DECLARE -- Code ©2004 by Edward Stoever
CURSOR c_get_duplicates
IS
SELECT ssrfees_term_code, ssrfees_crn, ssrfees_detl_code,
ssrfees_ftyp_code, ssrfees_levl_code, COUNT (*)
FROM ssrfees
HAVING COUNT (*) > 1
GROUP BY ssrfees_term_code,
ssrfees_crn,
ssrfees_detl_code,
ssrfees_ftyp_code,
ssrfees_levl_code;
var_get_duplicates c_get_duplicates%ROWTYPE;
CURSOR c_del_only_one
IS
SELECT ROWID
FROM ssrfees
WHERE ssrfees_term_code = var_get_duplicates.ssrfees_term_code
AND ssrfees_crn = var_get_duplicates.ssrfees_crn
AND ssrfees_detl_code = var_get_duplicates.ssrfees_detl_code
AND NVL(ssrfees_ftyp_code,'1') = NVL(var_get_duplicates.ssrfees_ftyp_code,'1')
AND NVL(ssrfees_levl_code,'1') = NVL(var_get_duplicates.ssrfees_levl_code,'1');
var_del_only_one ROWID;
BEGIN
OPEN c_get_duplicates;
LOOP
FETCH c_get_duplicates
INTO var_get_duplicates;
EXIT WHEN c_get_duplicates%NOTFOUND;
OPEN c_del_only_one;
FETCH c_del_only_one
INTO var_del_only_one;
DELETE FROM ssrfees
WHERE ROWID = var_del_only_one;
COMMIT;
CLOSE c_del_only_one;
END LOOP;
CLOSE c_get_duplicates;
END;
/
答案 1 :(得分:0)
包含“ROWID&gt; ...”的查询从根本上来说是可疑的。
我认为您正在寻找的是:
DELETE FROM
FCST f1
WHERE
ROWID NOT IN (
SELECT MIN(ROWID)
FROM FCST f2
GROUP BY f2.DMDUNIT,
f2.DMDGROUP,
f2.LOC,
f2.STARTDATE,
f2.TYPE,
UPPER(f2.FCSTID));
子查询标识一组ROWID,它们涵盖GROUP BY子句中列的所有唯一值,并删除所有其他值。
更快的替代方法可能是创建一个只包含要保留的行的新表,但如果这样做足够高,那就坚持下去。