Question

我有2个删除语句需要很长时间才能完成。 where子句中的列有几个索引。

什么是重复？ 如果2个或更多记录在列id，cid，type，trefid，ordrefid，amount和paydt中具有相同的值，则表示存在重复。

删除大约100万条记录。

可以以任何方式重写它们以使其更快。

DELETE FROM TABLE1 A WHERE loaddt < (
    SELECT max(loaddt) FROM TABLE1 B
    WHERE 
    a.id=b.id and
    a.cid=b.cid and
    NVL(a.type,'-99999') = NVL(b.type,'-99999') and
    NVL(a.trefid,'-99999')=NVL(b.trefid,'-99999') and
    NVL(a.ordrefid,'-99999')= NVL(b.ordrefid,'-99999') and
    NVL(a.amount,'-99999')=NVL(b.amount,'-99999') and
    NVL(a.paydt,TO_DATE('9999-12-31','YYYY-MM-DD'))=NVL(b.paydt,TO_DATE('9999-12-31','YYYY-MM-DD'))
);

    COMMIT;

DELETE FROM TABLE1 a where rowid > (
    Select min(rowid) from TABLE1 b
    WHERE 
    a.id=b.id and
    a.cid=b.cid and
    NVL(a.type,'-99999') = NVL(b.type,'-99999') and
    NVL(a.trefid,'-99999')=NVL(b.trefid,'-99999') and
    NVL(a.ordrefid,'-99999')= NVL(b.ordrefid,'-99999') and
    NVL(a.amount,'-99999')=NVL(b.amount,'-99999') and
    NVL(a.paydt,TO_DATE('9999-12-31','YYYY-MM-DD'))=NVL(b.paydt,TO_DATE('9999-12-31','YYYY-MM-DD'))
);

commit;

解释计划：

DELETE  TABLE1         

    HASH JOIN 1296491 
    Access Predicates 

        AND 
        A.ID=ITEM_1 
        A.CID=ITEM_2 
        ITEM_3=NVL(TYPE,'-99999') 
        ITEM_4=NVL(TREFID,'-99999') 
        ITEM_5=NVL(ORDREFID,'-99999') 
        ITEM_6=NVL(AMOUNT,(-99999)) 
        ITEM_7=NVL(PAYDT,TO_DATE(' 9999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 

    Filter Predicates 
        LOADDT<MAX(LOADDT)

    TABLE ACCESS  TABLE1     FULL    267904 
    VIEW VW_SQ_1         690385 
    SORT GROUP BY    690385 
        TABLE ACCESS TABLE1      FULL    267904

Answer 1

桌子有多大？如果删除的行数达到12％，那么您可以考虑索引。你能以某种方式对你的表进行分区 - 比如每周一次，然后只扫描实际的一周吗？

也许这可能更有效。当您使用聚合函数时，oracle必须遍历所有相关行（在您的情况下为fullscan），但是当您使用exists时，它会在第一次出现时停止。（当然，当where子句中的所有列都有一个基于函数的（因为NVL）索引时，查询会快得多）

DELETE FROM TABLE1 A 
WHERE exists (
SELECT 1 
FROM TABLE1 B
WHERE 
A.loaddt != b.loaddt
a.id=b.id and
a.cid=b.cid and
NVL(a.type,'-99999') = NVL(b.type,'-99999') and
NVL(a.trefid,'-99999')=NVL(b.trefid,'-99999') and
NVL(a.ordrefid,'-99999')= NVL(b.ordrefid,'-99999') and
NVL(a.amount,'-99999')=NVL(b.amount,'-99999') and
NVL(a.paydt,TO_DATE('9999-12-31','YYYY-MM-DD'))=NVL(b.paydt,TO_DATE('9999-12-31','YYYY-MM-DD'))
);

Answer 2

虽然有些人可能不同意，但我支持在程序上运行大量长期删除操作。在我看来，控制和跟踪进度要容易得多（而且你的DBA会更喜欢你;-)另外，不确定为什么你需要将table1连接到自身以识别重复项（如果你遇到过，我会很好奇使用当前方法快照太旧问题）。您也不需要多个删除语句，所有重复项都应该在一个进程中处理。最后，您应该检查为什么每周都会不断重新引入重复项，并且可能会更改加载过程（可能会执行合并/ upsert而不是所有插入）。

那就是说，你可能会尝试类似的东西：

-- first create mat view to find all duplicates
create materialized view my_dups_mv
tablespace my_tablespace
build immediate
refresh complete on demand
as
select id,cid,type,trefid,ordrefid,amount,paydt, count(1) as cnt
from table1
group by id,cid,type,trefid,ordrefid,amount,paydt
having count(1) > 1;

-- dedup data (or put into procedure and schedule along with mat view refresh above)
declare
  -- make sure my_dups_mv is refreshed first
  cursor dup_cur is
  select * from my_dups_mv;

  type duprec_t is record(row_id rowid);
  duprec duprec_t;
  type duptab_t is table of duprec_t index by pls_integer;
  duptab duptab_t;

  l_ctr pls_integer := 0;
  l_dupcnt pls_integer := 0;
begin
  for rec in dup_cur
  loop
    l_ctr := l_ctr + 1;

    -- assuming needed indexes exist
    select rowid
    bulk collect into duptab
    from table1
    where id = rec.id
    and cid = rec.cid
    and type = rec.type
    and trefid = rec.trefid
    and ordrefid = rec.ordrefid
    and amount = rec.amount
    and paydt = rec.paydt
    -- order by whatever makes sense to make the "keeper" float to top
    order by loaddt desc
    ;

    for i in 2 .. duptab.count
    loop
      l_dupcnt := l_dupcnt + 1;
      delete from table1 where rowid = duptab(i).row_id;
    end loop;

    if (mod(l_ctr, 10000) = 0) then
      -- log to log table here (calling autonomous procedure you'll need to implement)
      insert_logtable('Table1 deletes', 'Commit reached, deleted ' || l_dupcnt || ' rows');
      commit;
    end if;

  end loop;
  commit;
end;

检查日志表中的进度状态。

Answer 3

<强> 1。平行

alter session enable parallel dml;

DELETE /*+ PARALLEL */ FROM TABLE1 A WHERE loaddt < (
...

假设您拥有Enterprise Edition，一个理智的服务器配置，并且您使用的是11g。如果您没有使用11g，则并行语法略有不同。

<强> 2。降低内存要求

该计划显示了一个散列连接，这可能是一件好事。但是没有任何有用的过滤器，Oracle必须对整个表进行散列。（Tbone的查询，只使用GROUP BY，看起来更好，可能运行得更快。但它也可能会遇到同样的问题，试图对整个表进行排序或散列。）

如果哈希不能适合内存，则必须将其写入磁盘，这可能非常慢。由于您每周都运行此查询，因此只有一个表需要查看所有行。根据运行的确切时间，您可以在查询末尾添加类似的内容：) where b.loaddt >= sysdate - 14。这可能会显着减少写入临时表空间的数量。如果您使用像jakub.petr建议的分区策略，它也可能会减少读取IO。

第3。活跃报告

如果您想确切知道查询的作用，请运行Active Report：

select dbms_sqltune.report_sql_monitor(sql_id => 'YOUR_SQL_ID_HERE', type => 'active')
from dual;

（将输出保存为.html文件并使用浏览器打开。）

优化Oracle删除语句

3 个答案: