Question

我试图找到fixtures表中可用的重复记录，所以我写了这个查询：

SELECT *
FROM fixtures f
INNER JOIN (SELECT *
           FROM fixtures s
           GROUP BY s.match_id
           HAVING COUNT(player_id) > 1) dup
       ON m.match_id = dup.match_id;

但是查询确实很慢，并且表中只有1000条记录。这是可用的记录：

player_id | match_id  | team_id
  19014       2506172    12573
  19014       2506172    12573
  19015       2506172    12573
  19016       2506172    12573
  19016       2506172    12573
  19016       2506172    12573

查询应该返回与重复的19016和19014播放器相同的代码，我做错了什么？

Answer 1

如果您要在所有三列中都查找重复项，那么我不明白为什么需要加入。

SELECT player_id, match_id, team_id, count(*) 
FROM fixtures
GROUP BY player_id, match_id, team_id
HAVING COUNT(*) > 1

Answer 2

为什么不只是aggregation？：

SELECT s.player_id, s.match_id, s.team_id
FROM fixtures s
GROUP BY s.player_id, s.match_id, s.team_id
HAVING COUNT(*) > 1;

但是，如果我重新考虑问题，我会建议：

SELECT s.player_id
FROM fixtures s
GROUP BY s.player_id
HAVING COUNT(*) > 1;

Answer 3

我认为您的数据库浏览器限制了结果集。（1000行）

SELECT f2.*
FROM fixtures f
JOIN fixtures f2 on (f.match_id = f2.match_id and f.player_id<f2.player_id)

不列出播放器编号最低的重复项会更快。但是所有重复的匹配ID都在结果中。

在表中搜索重复的慢速性能

3 个答案: