Question

我有一个非常庞大的表，其中包含从多个系统收集的数据。我现在必须根据条件和多列删除重复记录。

以下是一个例子：

 +---------------+-------------+------+-----+--------
| System ID     | Debt Num    | Exp Dt | Account NO |
+---------------+-------------+------+-----+---------
| pay           | 2222        | 0114   |   111      |
| pay           | 2222        | 0214   |   111      |
| Online        | 2222        | 0214   |   111      |
| Online        | 3333        | 0115   |   222      |
| Online        | 3333        | 0116   |   222      |
| ERP           | 2222        | 0214   |   111      | 
| ERP           | 4444        | 0114   |   333      | 
+---------------+-------------+------+-----+--------

根据以上数据，删除重复项，并满足以下条件。

通过debit num，exp dt，account-no删除重复的行组，并使用max（exp dt）保留一条记录。
要保留的记录基于System ID的优先级。 1）支付2）在线和3）ERP。在上面的帐户111中，我们有来自所有三个系统的记录，并且借记卡的max（exp dt）是所有三个系统的0214。只应保留来自Pay且exp dt = 0214的记录，并删除其余部分。
如上面的帐户222示例所示，我们没有来自pay的记录，因此优先考虑系统Online和ERP，其中应保留max（exp dt）。

我尝试过在网上找到多个查询，例如group by，row_number over，但都只满足一个条件。

感谢您的帮助，提出您的想法和建议。

EDIT： Gordon的查询工作正常并且符合我的要求，但是当我在包含540K行的分段上运行相同时，它会错误地输出ORA-00600内部错误。

Answer 1

我认为您可以使用rowid和相关子查询来执行此操作：

delete from payinfo_staging_db
   where rowid <> (select rowid
                   from (select rowid
                         from payinfo_staging_db t2
                         where t2.debitNum = payinfo_staging_db.debitNum and
                               t2.accountNo = payinfo_staging_db.accountNo
                         order by t2.exp_dt,
                                  (case when t2.SystemId = 'Pay' then 1
                                        when t2.SystemId = 'Online' then 2
                                        when t2.SystemId = 'ERP' then 3
                                   end)
                        ) r
                    where rownum = 1
                  );

编辑：

Oracle中的嵌套引用一定存在问题。以下工作（至少在它解析和正确执行的意义上）：

delete from payinfo_staging_db
   where rowid <> (select min(rowid) keep (dense_rank first order by exp_dt desc,
                                                            (case when t2.SystemId = 'Pay' then 1
                                                                  when t2.SystemId = 'Online' then 2
                                                                  when t2.SystemId = 'ERP' then 3
                                                             end)
                                          ) as therowid
                   from payinfo_staging_db t2
                   where t2.debitNum = payinfo_staging_db.debitNum and
                         t2.accountNo = payinfo_staging_db.accountNo
                  );

SQL小提琴是here。

Oracle如何根据多个数据库列和条件删除重复项

1 个答案: