我现在真的陷入困境,因为我的查询有效,但只有一些示例数据,在实际数据库中有15'000个客户端,我甚至没有及时得到响应。
我有下表:
-Client-
id
firstname
lastname
现在我的查询必须得到所有可能的重复项并列出它们,所以假设我们的客户端A的id为1,B的id为2,C的id = 3,它们具有相同的名字和姓氏。输出应如下所示:
id | duplicateID | client | duplicate
1 | 2 | A | B
1 | 3 | A | C
2 | 3 | B | C
我的查询如下:
SELECT
c.id AS clientID,
d.id AS duplicateID,
CONCAT(c.firstname, ' ', c.lastname) AS fullName
FROM Client AS c
JOIN Client AS d
ON d.lastname = c.lastname
AND d.firstname = c.firstname
AND d.id != c.id
AND d.id > c.id
ORDER BY fullName, c.id
有没有办法提高性能而不会丢失任何结果?我查看了this回答,但是每个客户端只有一个副本,我想要所有重复项。
感谢任何帮助或提示,谢谢
根据要求修改:SQL Fiddle
答案 0 :(得分:1)
这是您的查询,每次稍微简化(删除一个on
条件)。
SELECT c.id AS clientID, d.id AS duplicateID, CONCAT(c.firstname, ' ', c.lastname) AS fullName
FROM Client c JOIN
Client d
ON d.lastname = c.lastname AND d.firstname = c.firstname AND d.id > c.id
ORDER BY fullName, c.id;
尝试使用索引。
create index client_lastname_firstname_id on client(lastname, firstname, id);
这应该有助于join
。如果您有大量数据,order by
可能会成为性能瓶颈。
另一种解决方案是将所有重复项放在一行上。这只涉及group by
,并提供每个名称的重复ID列表:
SELECT CONCAT(c.firstname, ' ', c.lastname) AS fullName,
group_concat(c.id order by c.id) AS clientIDs
FROM Client c
GROUP BY c.firstname, c.lastname;