Question

我有一个表员工有30000条记录。我需要根据两列连接删除重复记录。例如名称和工作，例如

martin clerk
martin clerk

以下是我的代码：

declare
    type typ_emp is table of emp%rowtype;

    v_emp   typ_emp;

    cursor cur_emp
    is
          select *
            from emp a
           where rowid >
                     (select min (rowid)
                        from emp b
                       where concat (concat (b.ename, '-'), b.job) =
                             concat (concat (a.ename, '-'), a.job)
                     )
               ;
begin
    open cur_emp;

    loop
        fetch cur_emp bulk collect into v_emp;

        exit when v_emp.count = 0;

        if v_emp.count > 0
        then
           for i in v_emp.first .. v_emp.last
           loop
               insert into backup_emp (ename, job)
               values (v_emp (i).ename, v_emp (i).job)
                    ;
           end loop;
        end if;
    end loop;

    close cur_emp;

    delete
      from emp s
     where s.rowid >
              any (select t.rowid
                     from emp t
                    where concat (concat (t.ename, '-'), t.job) =
                             concat (concat (s.ename, '-'), s.job));

    commit;
exception
    when others then
        Raise;
end;

删除记录需要很长时间。任何人都可以帮助我调整查询或建议我更好的方法。

提前致谢。

Answer 1

创建基于功能的索引可能会提高您的表现

 CREATE INDEX concatindex ON emp (ename||'-'||job);

删除语句看起来像这样

delete emp a where a.rowid > (select min(rowid) from emp b where b.ename||'-'||b.job=a.ename||'-'||a.job)

除非您需要将已删除的行插入到备份表中，这在您的问题中并不明确。如果是这样，我宁愿将行集合收集到集合中。如果您需要详细说明此选项，请发表评论。

Answer 2

以下是我的代码更改：

cursor cur_emp
    is
          select *
            from 
            (select b.*
                   ,row_number()over(partition by concat (concat (b.ename, '-'), b.job) order by ename)cnt
             from emp b                       
                     ) where cnt>1;

Answer 3

我希望这会有所帮助。

SELECT ROWID, ename || '-' || job AS concatenation,
       decode(rank() over(PARTITION BY ename || '-' || job ORDER BY ROWID), 1, 'keep', 'delete') AS to_do
  FROM emp
 ORDER BY ename || '-' || job, ROWID;

根据两列连接值删除重复记录

3 个答案: