Question

在Mysql中，我有一个包含两列（id，uuid）的表。然后我插入了3000万个值。（ps：uuid可以重复）

现在，我想使用Mysql语法在表中找到重复值，但是sql花费了太多时间。

我想搜索所有列，但是要花很多时间，所以我尝试查询前一百万行，这花了8秒钟。

然后我尝试了1000万行，它花了5分钟，然后有2000万行，服务器似乎死了。

select count(uuid) as cnt
from uuid_test
where id between 1
and 1000000
group by uuid having cnt > 1;

任何人都可以帮助我优化sql，谢谢

Answer 1

尝试此查询，

SELECT uuid, count(*) cnt FROM uuid_test GROUP BY 1 HAVING cnt>1;

希望有帮助。

Answer 2

查找重复项的最快方法通常是使用相关子查询而不是聚合：

select ut2.*
from uuid_test ut2
where exists (select 1
              from uuid_test ut2
              where ut2.uuid = ut.uuid and
                    ut2.id <> ut.id
             );

这可以利用uuid_test(uuid, id)上的索引。

如何在具有3000万行的mysql表中查找重复值

2 个答案: