Question

我需要在我的数据库中删除大约300,000个重复项。我想检查Card_id列是否有重复项，然后检查重复的时间戳。然后删除一个副本并保留一个副本。例如：

| Card_id | Time |    
| 1234    | 5:30 |     
| 1234    | 5:45 |    
| 1234    | 5:30 |    
| 1234    | 5:45 |

剩下的数据应该是：

| Card_id | Time |     
| 1234    | 5:30 |     
| 1234    | 5:45 |

我尝试了几种不同的删除语句，并合并到一个新表但没有运气。

更新：搞定了！

好了很多次失败之后我就开始为DB2工作了。

delete from(
select card_id, time, row_number() over (partition by card_id, time)  rn
from card_table) as A
where rn > 1

当card_id和时间重复时，

增量。将删除重复的或第二个rn。

Answer 1

我强烈建议你采用这种方法：

create temporary table tokeep as
    select distinct card_id, time
    from t;

truncate table t;

insert into t(card_id, time)
    select *
    from tokeep;

即存储您想要的数据。截断表，然后重新生成它。通过截断表，您可以保持触发器和权限以及与表相关联的其他内容。

这种方法也应该比删除许多重复项更快。

如果您打算这样做，您还应该插入一个正确的ID：

create temporary table tokeep as
    select distinct card_id, time
    from t;

truncate table t;

alter table t add column id int auto_increment;

insert into t(card_id, time)
    select *
    from tokeep;

Answer 2

如果您没有Primary key或Candidate key，可能只有一个命令没有选项。尝试下面的解决方案。

创建包含重复项的表

  select Card_id,Time
  into COPY_YourTable
  from YourTable
  group by Card_id,Time
  having count(1)>1

使用COPY_YourTable

删除重复项

  delete from YourTable
  where exists 
   (
     select 1
     from COPY_YourTable c
     where  c.Card_id = YourTable.Card_id
     and c.Time = YourTable.Time
   )

复制没有重复的数据

   insert into YourTable
   select Card_id,Time
   from COPY_YourTabl

SQL检查一列中的重复项并删除另一列

2 个答案: