如何在Oracle中使用游标从表中删除重复的行?

时间:2018-08-14 07:47:44

标签: oracle plsql duplicates cursor self-join

我想清理一个表中有很多重复记录的表,在该表中,每个客户编号都有很多记录,它们的eff_dt(列名)不同。

我只希望每个客户号码保留一个记录。

为此,我将仅使用具有最小eff_dt的cust_nbr记录作为参考。因此,对于表中的每个cust_nbr,我只想复制游标上具有最小eff_dt的记录,然后将该游标值与表中的其余记录进行比较。

我在创建游标时使用了以下查询:

select cust_nbr, min(eff_dt), name, address from cust;

但这给了我以下错误:

  

[错误]执行(1:8):ORA-00937:不是单组分组功能

请帮助我

4 个答案:

答案 0 :(得分:2)

您得到的错误意味着未聚合的列应成为GROUP BY子句的一部分,即

select cust_nbr, min(eff_dt), name, address
from cust
group by cust_nbr, name, address;

P.S。请注意,逐行删除重复(在光标循环中)是 slow by by-slow 。您最好切换到某种 set 处理。一个简单的是:

delete from cust
      where (cust_nbr,
             eff_dt,
             name,
             address) not in (  select cust_nbr,
                                       min (eff_dt),
                                       name,
                                       address
                                  from cust
                              group by cust_nbr, name, address);

答案 1 :(得分:1)

我不确定游标逻辑是做什么的。我会简单地删除重复项:

delete cust
where  rowid in
       ( select lead(rowid) over (partition by cust_nbr order by eff_dt)
         from   cust c );

答案 2 :(得分:0)

以下应为您工作:

DELETE cust c
 WHERE EXISTS (SELECT 1 FROM cust
                WHERE cust_nbr = c.cust_nbr
                  AND name     = c.name
                  AND address  = c.address
                  AND eff_dt   < c.eff_dt)

答案 3 :(得分:0)

如果我正确理解,您有两组数据要删除:

  • 所有更改了客户数据(姓名,地址...)的行在最新更改之前
  • 所有行之后,其中仅eff_dt被更改,其他所有相同。

在这种情况下,可以使用两个分析函数来查找客户数据中最新更改的最小日期:

create table test_tab(id number, eff_dt date, name varchar2(20), address varchar2(50));

insert into test_tab values (1, to_date('01-jul-2018', 'dd-mon-yyyy'), 'Name 1', 'Address 1');
insert into test_tab values (1, to_date('15-jul-2018', 'dd-mon-yyyy'), 'Name 1', 'Address 1');
insert into test_tab values (1, to_date('01-aug-2018', 'dd-mon-yyyy'), 'Name 1 changed', 'Address 1 changed');
insert into test_tab values (1, to_date('05-aug-2018', 'dd-mon-yyyy'), 'Name 1 changed', 'Address 1 changed');
insert into test_tab values (1, to_date('10-aug-2018', 'dd-mon-yyyy'), 'Name 1 changed', 'Address 1 changed');
insert into test_tab values (2, to_date('12-jul-2018', 'dd-mon-yyyy'), 'Name 2', 'Address 2');
insert into test_tab values (2, to_date('18-jul-2018', 'dd-mon-yyyy'), 'Name 2', 'Address 2');
insert into test_tab values (3, to_date('15-jul-2018', 'dd-mon-yyyy'), 'Name 3', 'Address 3');
insert into test_tab values (3, to_date('18-jul-2018', 'dd-mon-yyyy'), 'Name 3 changed', 'Address 3 changed');
insert into test_tab values (3, to_date('25-jul-2018', 'dd-mon-yyyy'), 'Name 3 changed again', 'Address 3 changed again');
insert into test_tab values (3, to_date('12-aug-2018', 'dd-mon-yyyy'), 'Name 3 changed again', 'Address 3 changed again');

select id, eff_dt, name, address, -- rn, min_eff_dt
  from (select id, eff_dt, name, address, -- min_eff_dt,
               row_number() over (partition by id order by min_eff_dt desc) rn -- we need the highest minimum date - that is the date when last change in data took place (apart from eff_dt)
          from (select id, eff_dt, name, address,
                       min(eff_dt) over (partition by id, name, address order by eff_dt) min_eff_dt -- minium dates of the customer's data changes
                  from test_tab))
 where rn = 1;

您可以通过删除where rn = 1并将min_eff_dt添加到第二个select语句并将rn, min_eff_dt添加到最上方的select语句来测试脚本,以便查看分析函数的结果。

您可以像威廉姆斯(William)的回复一样使用delete

delete from test_tab
 where rowid in
         (select rowid
            from (select row_number() over (partition by id order by min_eff_dt desc) rn -- we need the highest minimum date - that is the date when last change in data took place (apart from eff_dt)
                    from (select id,
                                 min(eff_dt) over (partition by id, name, address order by eff_dt) min_eff_dt -- minium dates of the customer's data changes
                            from test_tab))
           where rn > 1);