Question

我创建了一个spring集成工作流，该工作流将数据从csv加载到oracle数据库。这是一个集群环境，每个节点都处理一个csv文件并将数据加载到临时表中。

临时表的结构：（AccountNumber上的索引）

ID
AccountNumber
ItemId
Value

我有一个spring集成Rabbitmq配置，可以将文件名发布到队列中。群集中的每个节点仅拾取一个文件，从文件系统（共享文件系统和oracle数据库）读取csv，并将数据加载到TEMP表中。（每个csv的大小为2GB）。

将所有数据加载到临时表中之后，一个节点应该将数据从临时表移动到与临时表具有相同结构的主表中。

主表：（在Account Number上的索引）

Id
AccountNumber
ItemId
Value

我创建了一个存储过程，该存储过程从主表中删除了现有的帐号并在主表中加载了帐户。

一旦数据移到主表中，我将在过程结束时从临时表中删除数据。

我的问题是截断该表的最佳方法是什么。

问题：假设我在主表中有此记录。

主表：

Account Number   ItemId   ItemValue
-----------------------------------
123456             5        XYZ
123456             6        ABC
123456             7        DEF

现在我从CSV到临时表中获得此条目：

AccountNumber    ItemId    ItemValue
------------------------------------
123456             5        FGH

现在我的主表应该只有一个值。应该删除具有ItemId 6和7的行。

Account Number   ItemId    ItemValue
-------------------------------------
123456             5        FGH

我可以通过合并入来实现吗？

场景1 ：

在将数据加载到TEMP表中之前最好截断该表吗？（两个单独的数据库事务，一个用于截断表，另一个用于数据移动）。

（此命令在将文件名发布到队列之前调用）一种在加载之前分批清理临时表的过程。

步骤1：

create or replace procedure CleanTempTable
IS
v_numberRows int :=20000;

BEGIN
loop
Delete from TEMP where rownum <= v_numberRows;
EXIT WHEN SQL%ROWCOUNT = 0;
commit;
END LOOP;
END;
/

一个将数据从temp移到main的过程。

这在合并阶段结束时调用。

CREATE OR REPLACE PROCEDURE LOAD_DATA_TO_CONSOLIDATE (updatecount OUT NUMBER )
IS
  cnt number := 0;
  account_num MAIN_TABLE.ACCOUNT_NO%TYPE;
  CURSOR account_cursor IS
    SELECT distinct ACCOUNT_NO from TEMP_TABLE;
BEGIN
OPEN account_cursor;
    LOOP
        FETCH account_cursor INTO account_num;
        EXIT WHEN account_cursor%NOTFOUND;
        delete from MAIN where ACCOUNT_NO = account_num;
    insert into MAIN(ID,ACCOUNT_NO,FACT_ID,FACT_VALUE) select HIBERNATE_SEQUENCE.nextval,temp.ACCOUNT_NO,temp.VALUE from TEMP temp
    where ACCOUNT_NO = account_num;
        cnt := cnt + sql%rowcount;
    commit;
    END LOOP;
    updatecount := cnt;
    CLOSE account_cursor;
END LOAD_DATA_TO_CONSOLIDATE;

方案2：

将数据从TEMP表移至主表后，最好截断该表。（一个存储过程中的所有内容（在一个DB事务中）

CREATE OR REPLACE PROCEDURE LOAD_DATA_TO_CONSOLIDATE (updatecount OUT NUMBER )
IS
  cnt number := 0;
  account_num MAIN_TABLE.ACCOUNT_NO%TYPE;
  CURSOR account_cursor IS
    SELECT distinct ACCOUNT_NO from TEMP_TABLE;
BEGIN
OPEN account_cursor;
    LOOP
        FETCH account_cursor INTO account_num;
        EXIT WHEN account_cursor%NOTFOUND;
        delete from MAIN where ACCOUNT_NO = account_num;
    insert into MAIN(ID,ACCOUNT_NO,FACT_ID,FACT_VALUE) select HIBERNATE_SEQUENCE.nextval,temp.ACCOUNT_NO,temp.VALUE from TEMP temp
    where ACCOUNT_NO = account_num;
        cnt := cnt + sql%rowcount;
    commit;
    END LOOP;
    updatecount := cnt;
    CLOSE account_cursor;
delete from TEMP; //removing all data
END LOAD_DATA_TO_CONSOLIDATE;

Answer 1

从我的Oracle观点出发，您都不建议使用任何选项。原因如下：

使您的工作成倍增长

逐行处理很慢
提交循环可能会导致ORA-01555错误
删除（DELETE命令）总是比截断（TRUNCATE）慢

建议：为了避免使用临时表，请使用外部表功能，该功能使用您的CSV文件，就像使用普通的Oracle表一样。

这意味着所有事情都可以用两个语句完成：

-- Delete rows from the MAIN table whose ACCOUNT_NO exists in the CSV file
delete from main m
where exists (select null 
              from external_table t
              where t.account_no = m.account_no
             );

-- Insert rows into the MAIN table
insert into main (col1, col2, ...)
select col1, col2 from external_table;

或者-可能会执行的最好-您可以为表中已经存在的UPDATE设置ACCOUNT_NO的值，而仅INSERT设置不存在的行；而不是两个命令，而是使用单个快速MERGE语句（也称为 upsert ）

merge into main m
  using (select t.account_no, t.col1, t.col2, ...
         from external_table t
        ) x
on m.account_no = x.account_no
when matched then update set m.col1 = x.col1,
                             m.col2 = x.col2, ...
when not matched then insert (account_no, col1, col2, ...)
                      values (x.account_no, x.col1, x.col2, ...);

使用MERGE，您无需在此处加载，从此处删除，从此处插入到此处……非常整洁，并且正如我所说的那样，速度很快。完全不需要PL / SQL。

截断表表的最佳解决方案

1 个答案: