我有这个mysql查询:
SELECT DISTINCT post.postId,hash,previewUrl,lastRetrieved
FROM post INNER JOIN (tag as t1,taggedBy as tb1,tag as t2,taggedBy as tb2,tag as t3,taggedBy as tb3)
ON post.id=tb1.postId AND tb1.tagId=t1.id AND post.id=tb2.postId AND tb2.tagId=t2.id AND post.id=tb3.postId AND tb3.tagId=t3.id
WHERE ((t1.name="a" AND t2.name="b") OR t3.name="c")
ORDER BY post.postId DESC LIMIT 0,100;
运行该查询大约需要15秒,而没有 DISTINCT
的同一查询只需不到一秒钟。
EXPLAIN
查询的 DISTINCT
输出:
+----+-------------+-------+--------+---------------------+---------+---------+--------------------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------+---------+---------+--------------------------+------+-----------------------+
| 1 | SIMPLE | post | index | PRIMARY | postId | 4 | NULL | 1 | Using temporary |
| 1 | SIMPLE | tb1 | ref | PRIMARY,tagId | PRIMARY | 4 | e621datamirror.post.id | 13 | Using index; Distinct |
| 1 | SIMPLE | t1 | eq_ref | PRIMARY,name,name_2 | PRIMARY | 4 | e621datamirror.tb1.tagId | 1 | Distinct |
| 1 | SIMPLE | tb2 | ref | PRIMARY,tagId | PRIMARY | 4 | e621datamirror.post.id | 13 | Using index; Distinct |
| 1 | SIMPLE | t2 | eq_ref | PRIMARY,name,name_2 | PRIMARY | 4 | e621datamirror.tb2.tagId | 1 | Distinct |
| 1 | SIMPLE | tb3 | ref | PRIMARY,tagId | PRIMARY | 4 | e621datamirror.post.id | 13 | Using index; Distinct |
| 1 | SIMPLE | t3 | eq_ref | PRIMARY,name,name_2 | PRIMARY | 4 | e621datamirror.tb3.tagId | 1 | Using where; Distinct |
+----+-------------+-------+--------+---------------------+---------+---------+--------------------------+------+-----------------------+
7 rows in set (0.01 sec)
查询的 EXPLAIN
输出没有 DISTINCT
:
+----+-------------+-------+--------+---------------------+---------+---------+--------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------+---------+---------+--------------------------+------+-------------+
| 1 | SIMPLE | post | index | PRIMARY | postId | 4 | NULL | 1 | NULL |
| 1 | SIMPLE | tb1 | ref | PRIMARY,tagId | PRIMARY | 4 | e621datamirror.post.id | 13 | Using index |
| 1 | SIMPLE | t1 | eq_ref | PRIMARY,name,name_2 | PRIMARY | 4 | e621datamirror.tb1.tagId | 1 | NULL |
| 1 | SIMPLE | tb2 | ref | PRIMARY,tagId | PRIMARY | 4 | e621datamirror.post.id | 13 | Using index |
| 1 | SIMPLE | t2 | eq_ref | PRIMARY,name,name_2 | PRIMARY | 4 | e621datamirror.tb2.tagId | 1 | NULL |
| 1 | SIMPLE | tb3 | ref | PRIMARY,tagId | PRIMARY | 4 | e621datamirror.post.id | 13 | Using index |
| 1 | SIMPLE | t3 | eq_ref | PRIMARY,name,name_2 | PRIMARY | 4 | e621datamirror.tb3.tagId | 1 | Using where |
+----+-------------+-------+--------+---------------------+---------+---------+--------------------------+------+-------------+
CREATE TABLE `post` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`postId` int(11) NOT NULL,
`hash` varchar(32) COLLATE utf8_bin NOT NULL,
`previewUrl` varchar(512) COLLATE utf8_bin NOT NULL,
`lastRetrieved` datetime NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `postId` (`postId`),
UNIQUE KEY `hash` (`hash`),
KEY `postId_2` (`postId`),
KEY `postId_3` (`postId`)
) ENGINE=InnoDB AUTO_INCREMENT=692561 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
CREATE TABLE `tag` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8_bin NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`),
KEY `name_2` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=157876 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
CREATE TABLE `taggedBy` (
`postId` int(11) NOT NULL,
`tagId` int(11) NOT NULL,
PRIMARY KEY (`postId`,`tagId`),
KEY `tagId` (`tagId`),
CONSTRAINT `taggedBy_ibfk_1` FOREIGN KEY (`postId`) REFERENCES `post` (`id`) ON DELETE CASCADE,
CONSTRAINT `taggedBy_ibfk_2` FOREIGN KEY (`tagId`) REFERENCES `tag` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
是什么导致此查询如此缓慢?我怎样才能加快速度呢?
我希望我已经提供了足够的信息,所以你们可以给我一些有意义的答案。如果我遗漏了一些东西,我会乐意添加它。
答案 0 :(得分:1)
正在讨论一些事情,即使在@ SlimGhost合理(但已删除)的答案中也是如此。
DISTINCT与GROUP BY
虽然GROUP BY
有时可以用来替换DISTINCT
,但请不要这样做;它们用于不同的事情。
他们都需要一些形式的额外努力。 (我将在10x之后到达。)两者都必须发现共同的值 - 在整行(对于DISTINCT
)或对于分组的项目。这可以通过至少两种方式之一完成。 (可能大多数引擎都内置了这些选项。)请注意DISTINCT
或GROUP BY
必须在WHERE
之后,ORDER BY
和LIMIT
之前。< / p>
ORDER BY + LIMIT
请注意,查询正在通过4列进行DISTINCT
:post.postId, hash, previewUrl, lastRetrieved
。这些是全部在post
中还是分散在7个表中并不明显。 (请通过对每一栏进行排位来澄清。)
我们假设需要完成JOIN才能找到4列。
假设没有DISTINCT
。现在,操作是
post
顺序浏览ORDER BY post.postID
。但是对于DISTINCT
,优化器无法做出如此简化的假设以便停止短路。代替:
post
顺序浏览ORDER BY post.postID
。 (由于OR
,从t1 / t2 / t3开始是不可能的。)实际上,不清楚优化器是否会按此顺序进行操作。DISTINCT
。post
的更多行(可能是10倍?)请记住,优化器对postId
与hash
是否为1:1无关,等等。因此,它无法简化假设。假设JOIN中有200行,其中postId
最小,hash
恰好按降序排列。闻起来像需要“排序”。
EXPLAIN FORMAT=JSON SELECT ...
可能会为您提供一些详细信息。
哎哟。您有id
和UNIQUE(postid)
?摆脱id
并将postId
转变为PRIMARY KEY
。仅此一点,可能会加快速度。
什么是hash
哈希?
请使用JOIN ... ON ...
语法。
postId
上有3个索引;摆脱额外的两个。
为什么要使用DISTINCT?
既然我发现所有SELECTed
列都来自一个表格,并且显然很容易将它们分开,为什么甚至考虑使用DISTINCT
。
(更新)
加入
FROM post INNER JOIN (tag as t1,taggedBy as tb1,...
ON post.id=tb1.postId AND tb1.tagId=t1.id AND ...
-->
FROM post
JOIN tag AS t1 ON post.id = tb1.postId
JOIN taggedBy AS tb1 ON tb2.tagId = t2.id
... (each ON is next to the JOIN it applies to)
加速技术
SELECT p2.postId, p2.hash, p2.previewUrl, p2.lastRetrieved
FROM (
SELECT DISTINCT postId -- Only the PRIMARY KEY
FROM post
JOIN ... etc
WHERE ... ...
ORDER BY postId
LIMIT 100
) x
JOIN post AS p2 ON x.postId = p2.id -- self join for getting rest of fields
ORDER BY x.postId -- assuming you need the ordering
这会将DISTINCT
放在内部查询中,您只能获取一列(postId
)。 (我不确定这种技术对你的案例有多大帮助。)