Oracle:批量删除重复记录

时间:2012-07-25 13:16:21

标签: oracle11g

我有一张表,其中有1220200条重复记录。

我正在使用以下查询删除重复记录。

DELETE /*+ NO_CPU_COSTING  */
  FROM  FCST f1
  WHERE
       ROWID >
           (SELECT MIN (ROWID)
              FROM FCST f2
             WHERE 
                  f1.DMDUNIT = f2.DMDUNIT
                   AND f1.DMDGROUP = f2.DMDGROUP
                   AND f1.LOC = f2.LOC
                   AND f1.STARTDATE = f2.STARTDATE
                   AND f1.TYPE = f2.TYPE
                   AND UPPER (f1.FCSTID) = UPPER (f2.FCSTID));

删除这些记录需要将近2分钟。我尝试了批量删除方法,方法是将重复数据加载到游标中并将其大量删除,但这需要更多时间。

优化此代码的最佳方法是什么?

2 个答案:

答案 0 :(得分:0)

一件简单的事情就是这样

delete /*+RULE*/ from t
where rowid in ( select rid
                   from ( select rowid rid,
                                 row_number() over
                                   (partition by cust_seg_nbr order by rowid) rn
                            from t
                        )
                 where rn <> 1 );

但是如果你有大量数据那么

检查此链接http://www.rampant-books.com/t_stoever_delete_duplicates.htm或使用以下代码

 DECLARE     -- Code ©2004 by Edward Stoever
   CURSOR c_get_duplicates
   IS
      SELECT   ssrfees_term_code, ssrfees_crn, ssrfees_detl_code,
               ssrfees_ftyp_code, ssrfees_levl_code, COUNT (*)
          FROM ssrfees
        HAVING COUNT (*) > 1
      GROUP BY ssrfees_term_code,
               ssrfees_crn,
               ssrfees_detl_code,
               ssrfees_ftyp_code,
               ssrfees_levl_code;

   var_get_duplicates c_get_duplicates%ROWTYPE;

   CURSOR c_del_only_one
   IS
      SELECT ROWID
        FROM ssrfees
       WHERE ssrfees_term_code = var_get_duplicates.ssrfees_term_code
         AND ssrfees_crn = var_get_duplicates.ssrfees_crn
         AND ssrfees_detl_code = var_get_duplicates.ssrfees_detl_code
         AND NVL(ssrfees_ftyp_code,'1') = NVL(var_get_duplicates.ssrfees_ftyp_code,'1')
         AND NVL(ssrfees_levl_code,'1') = NVL(var_get_duplicates.ssrfees_levl_code,'1');

   var_del_only_one ROWID;
BEGIN
   OPEN c_get_duplicates;

   LOOP
      FETCH c_get_duplicates
       INTO var_get_duplicates;

      EXIT WHEN c_get_duplicates%NOTFOUND;

      OPEN c_del_only_one;

      FETCH c_del_only_one
       INTO var_del_only_one;

      DELETE FROM ssrfees
            WHERE ROWID = var_del_only_one;

      COMMIT;

      CLOSE c_del_only_one;
   END LOOP;

   CLOSE c_get_duplicates;
END;
/

答案 1 :(得分:0)

包含“ROWID&gt; ...”的查询从根本上来说是可疑的。

我认为您正在寻找的是:

DELETE FROM
  FCST f1
WHERE
  ROWID NOT IN (
    SELECT   MIN(ROWID)
    FROM     FCST f2
    GROUP BY f2.DMDUNIT,
             f2.DMDGROUP,
             f2.LOC,
             f2.STARTDATE,
             f2.TYPE,
             UPPER(f2.FCSTID));

子查询标识一组ROWID,它们涵盖GROUP BY子句中列的所有唯一值,并删除所有其他值。

更快的替代方法可能是创建一个只包含要保留的行的新表,但如果这样做足够高,那就坚持下去。