Question

我有这个mysql查询：

SELECT DISTINCT post.postId,hash,previewUrl,lastRetrieved
FROM post INNER JOIN (tag as t1,taggedBy as tb1,tag as t2,taggedBy as tb2,tag as t3,taggedBy as tb3)
ON post.id=tb1.postId AND tb1.tagId=t1.id AND post.id=tb2.postId AND tb2.tagId=t2.id AND post.id=tb3.postId AND tb3.tagId=t3.id
WHERE ((t1.name="a" AND t2.name="b") OR t3.name="c") 
ORDER BY post.postId DESC LIMIT 0,100;

运行该查询大约需要15秒，而没有 DISTINCT的同一查询只需不到一秒钟。

使用 EXPLAIN查询的
DISTINCT输出：

+----+-------------+-------+--------+---------------------+---------+---------+--------------------------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+---------------------+---------+---------+--------------------------+------+-----------------------+ | 1 | SIMPLE | post | index | PRIMARY | postId | 4 | NULL | 1 | Using temporary | | 1 | SIMPLE | tb1 | ref | PRIMARY,tagId | PRIMARY | 4 | e621datamirror.post.id | 13 | Using index; Distinct | | 1 | SIMPLE | t1 | eq_ref | PRIMARY,name,name_2 | PRIMARY | 4 | e621datamirror.tb1.tagId | 1 | Distinct | | 1 | SIMPLE | tb2 | ref | PRIMARY,tagId | PRIMARY | 4 | e621datamirror.post.id | 13 | Using index; Distinct | | 1 | SIMPLE | t2 | eq_ref | PRIMARY,name,name_2 | PRIMARY | 4 | e621datamirror.tb2.tagId | 1 | Distinct | | 1 | SIMPLE | tb3 | ref | PRIMARY,tagId | PRIMARY | 4 | e621datamirror.post.id | 13 | Using index; Distinct | | 1 | SIMPLE | t3 | eq_ref | PRIMARY,name,name_2 | PRIMARY | 4 | e621datamirror.tb3.tagId | 1 | Using where; Distinct | +----+-------------+-------+--------+---------------------+---------+---------+--------------------------+------+-----------------------+ 7 rows in set (0.01 sec)
查询的
EXPLAIN输出没有 DISTINCT：

+----+-------------+-------+--------+---------------------+---------+---------+--------------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+---------------------+---------+---------+--------------------------+------+-------------+ | 1 | SIMPLE | post | index | PRIMARY | postId | 4 | NULL | 1 | NULL | | 1 | SIMPLE | tb1 | ref | PRIMARY,tagId | PRIMARY | 4 | e621datamirror.post.id | 13 | Using index | | 1 | SIMPLE | t1 | eq_ref | PRIMARY,name,name_2 | PRIMARY | 4 | e621datamirror.tb1.tagId | 1 | NULL | | 1 | SIMPLE | tb2 | ref | PRIMARY,tagId | PRIMARY | 4 | e621datamirror.post.id | 13 | Using index | | 1 | SIMPLE | t2 | eq_ref | PRIMARY,name,name_2 | PRIMARY | 4 | e621datamirror.tb2.tagId | 1 | NULL | | 1 | SIMPLE | tb3 | ref | PRIMARY,tagId | PRIMARY | 4 | e621datamirror.post.id | 13 | Using index | | 1 | SIMPLE | t3 | eq_ref | PRIMARY,name,name_2 | PRIMARY | 4 | e621datamirror.tb3.tagId | 1 | Using where | +----+-------------+-------+--------+---------------------+---------+---------+--------------------------+------+-------------+

CREATE TABLE `post` ( `id` int(11) NOT NULL AUTO_INCREMENT, `postId` int(11) NOT NULL, `hash` varchar(32) COLLATE utf8_bin NOT NULL, `previewUrl` varchar(512) COLLATE utf8_bin NOT NULL, `lastRetrieved` datetime NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `postId` (`postId`), UNIQUE KEY `hash` (`hash`), KEY `postId_2` (`postId`), KEY `postId_3` (`postId`) ) ENGINE=InnoDB AUTO_INCREMENT=692561 DEFAULT CHARSET=utf8 COLLATE=utf8_bin; CREATE TABLE `tag` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(255) COLLATE utf8_bin NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `name` (`name`), KEY `name_2` (`name`) ) ENGINE=InnoDB AUTO_INCREMENT=157876 DEFAULT CHARSET=utf8 COLLATE=utf8_bin; CREATE TABLE `taggedBy` ( `postId` int(11) NOT NULL, `tagId` int(11) NOT NULL, PRIMARY KEY (`postId`,`tagId`), KEY `tagId` (`tagId`), CONSTRAINT `taggedBy_ibfk_1` FOREIGN KEY (`postId`) REFERENCES `post` (`id`) ON DELETE CASCADE, CONSTRAINT `taggedBy_ibfk_2` FOREIGN KEY (`tagId`) REFERENCES `tag` (`id`) ON DELETE CASCADE ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

是什么导致此查询如此缓慢？我怎样才能加快速度呢？

我希望我已经提供了足够的信息，所以你们可以给我一些有意义的答案。如果我遗漏了一些东西，我会乐意添加它。

Answer 1

正在讨论一些事情，即使在@ SlimGhost合理（但已删除）的答案中也是如此。

DISTINCT与GROUP BY

虽然GROUP BY有时可以用来替换DISTINCT，但请不要这样做;它们用于不同的事情。

他们都需要一些形式的额外努力。（我将在10x之后到达。）两者都必须发现共同的值 - 在整行（对于DISTINCT）或对于分组的项目。这可以通过至少两种方式之一完成。（可能大多数引擎都内置了这些选项。）请注意DISTINCT或GROUP BY必须在WHERE之后，ORDER BY和LIMIT之前。< / p>

在生成输出时保留某种内部关联数组。如果优化器可以看到不会有“太多”可能的不同值，这是实用的。
对输出进行排序;然后在输出中传递重复数据删除或分组。无论大小如何，这都有效。

ORDER BY + LIMIT

请注意，查询正在通过4列进行DISTINCT：post.postId, hash, previewUrl, lastRetrieved。这些是全部在post中还是分散在7个表中并不明显。（请通过对每一栏进行排位来澄清。）

我们假设需要完成JOIN才能找到4列。

假设没有DISTINCT。现在，操作是

按post顺序浏览ORDER BY post.postID。
对于每个这样的行，执行JOIN并检查WHERE。
在100行通过WHERE后，停止。

但是对于DISTINCT，优化器无法做出如此简化的假设以便停止短路。代替：

按post顺序浏览ORDER BY post.postID。（由于OR，从t1 / t2 / t3开始是不可能的。）实际上，不清楚优化器是否会按此顺序进行操作。
对于每个这样的行，执行JOIN并检查WHERE。
执行DISTINCT。
在100行通过WHERE后，停止。注意：此可能涉及来自post的更多行（可能是10倍？）

请记住，优化器对postId与hash是否为1：1无关，等等。因此，它无法简化假设。假设JOIN中有200行，其中postId最小，hash恰好按降序排列。闻起来像需要“排序”。

EXPLAIN FORMAT=JSON SELECT ... 可能会为您提供一些详细信息。

哎哟。您有id和UNIQUE(postid)？摆脱id并将postId转变为PRIMARY KEY。仅此一点，可能会加快速度。

什么是hash哈希？

请使用JOIN ... ON ...语法。

postId上有3个索引;摆脱额外的两个。

为什么要使用DISTINCT？

既然我发现所有SELECTed列都来自一个表格，并且显然很容易将它们分开，为什么甚至考虑使用DISTINCT。

（更新）

加入

FROM post INNER JOIN (tag as t1,taggedBy as tb1,...
                   ON post.id=tb1.postId AND tb1.tagId=t1.id AND ...
 -->
FROM post
JOIN tag       AS  t1 ON post.id = tb1.postId
JOIN taggedBy  AS tb1 ON tb2.tagId = t2.id
...  (each ON is next to the JOIN it applies to)

加速技术

SELECT p2.postId, p2.hash, p2.previewUrl, p2.lastRetrieved
    FROM (
        SELECT DISTINCT postId           -- Only the PRIMARY KEY
            FROM post
            JOIN ... etc
            WHERE ... ...
            ORDER BY postId
            LIMIT 100
         ) x
    JOIN post AS p2  ON x.postId = p2.id   -- self join for getting rest of fields
    ORDER BY x.postId   -- assuming you need the ordering

这会将DISTINCT放在内部查询中，您只能获取一列（postId）。（我不确定这种技术对你的案例有多大帮助。）

为什么DISTINCT使这个查询比没有查询要长10倍？

1 个答案: