Question

我有一个名为Clientescrm的大表，它有两列。 Idclientecrm（主键; auto_increment）和CUIT。

我想找到CUIT的重复项，但我有两个问题：

数据的格式可能会有所不同，有时会写成20-12344567-6，其他为20_12344567_6或20123446576.我想删除所有符号以分析数据。
我需要的另一件事是Idclientecrm值（主键）。

我使用的是这样的东西：

select replace(replace(cuit, '-', ''),'_','') as cuit, count(cuit) as duplicates
from clientescrm
group by cuit
having count(cuit) > 1

该查询缺少主键（idclientecrm），我也需要。

我希望结果表看起来像这样：

| idclientecrm | cuit | duplicates |
| 1 | 20123456786 | 2 |
| 2 | 20123456786 | 2 |
| 3 | 23123456787 | 3 |
| 4 | 23123456787 | 3 |
| 5 | 23123456787 | 3 |
| 6 | 27123456783 | 2 |
| 7 | 27123456783 | 2 |
| 8 | 20111111116 | 3 |
| 9 | 20111111116 | 3 |
| 10 | 20111111116 | 3 |

Thnx提前为您提供帮助

Answer 1

您可以接受查询，将Min（idClientTerm）添加到选择列表中。这假设您要保留具有最低ID的密钥。基本上删除所有ID号未出现在您的查询中的记录...

select replace(replace(cuit, '-', ''),'_','') as cuit, 
       count(*) as duplicates,min(idClientTerm) as IDtoSave
from clientescrm
group by replace(replace(cuit, '-', ''),'_','')

注意，我摘下了 count（*）＆gt; 1 ，这样，如果CUIT只出现一次，它仍将被包含，所以你不删除单次出现

然后，您将从idClientTerm不在子查询中的表中删除

select * from clientescm
where id not in 
(
    select min(idClientTerm) as IDtoSave
    from clientescrm
    group by replace(replace(cuit, '-', ''),'_','')
)

请务必先备份您的数据，然后建议运行一些SELECT查询以确保您的ID列表保持正常

Answer 2

将原始表格加入您的查询？像这样：

select
 a.cuit,
 a.duplicates,
 c.Idclientecrm
from
(
   select
      replace(replace(cuit, '-', ''),'_','') as cuit,
      count(cuit) as duplicates
   from clientescrm
   group by cuit
   having count(cuit) > 1
) a,
Clientescrm c
where
a.cuit= replace(replace(c.cuit, '-', ''),'_','')

Answer 3

您还需要将替换放在您的组中。您看到的任何结果都会在表格中出现一次以上。看来这就是你要找的东西。

select replace(replace(cuit, '-', ''),'_','') as cuit, count(*)
    from clientescrm
    group by replace(replace(cuit, '-', ''),'_','')
    having count(*) > 1

SQL：在大表中查找重复项

3 个答案: