这适用于带有MySQL 5.7的InnoDB。
我有一组4个相关的级联查询:
SELECT DISTINCT A, COUNT(*) FROM MYTABLE
WHERE D IN ? AND A > ?
GROUP BY A ORDER BY A LIMIT 100
SELECT DISTINCT B, COUNT(*) FROM MYTABLE
WHERE A = ? AND D IN ? AND B > ?
GROUP BY B ORDER BY B LIMIT 100
SELECT DISTINCT C, COUNT(*) FROM MYTABLE
WHERE A = ? AND B = ? AND D IN ? AND C > ?
GROUP BY C ORDER BY C LIMIT 100
SELECT E, F, G, H FROM MYTABLE
WHERE A = ? AND B = ? AND C = ? AND D IN ? AND ID > ?
ORDER BY ID LIMIT 100
索引的最小集合是什么,以便所有查询都可以使用其中一个索引来修剪每个WHERE子句并使用它/它们来加速ORDER BY?
根据我对复合索引的理解,我需要:
CREATE INDEX INDEX01 ON MYTABLE (D, A)
CREATE INDEX INDEX02 ON MYTABLE (A, D, B)
CREATE INDEX INDEX03 ON MYTABLE (A, B, D, C)
CREATE INDEX INDEX04 ON MYTABLE (A, B, C, D)
(ID是主键列)
这是对的吗?
我想如果我重新排序WHERE子句,我可以只使用一个复合索引:
SELECT DISTINCT A, COUNT(*) FROM MYTABLE
WHERE D IN ? AND A > ?
GROUP BY A ORDER BY A LIMIT 100
SELECT DISTINCT B, COUNT(*) FROM MYTABLE
WHERE D IN ? AND A = ? AND B > ?
GROUP BY B ORDER BY B LIMIT 100
SELECT DISTINCT C, COUNT(*) FROM MYTABLE
WHERE D IN ? AND A = ? AND B = ? AND C > ?
GROUP BY C ORDER BY C LIMIT 100
SELECT E, F, G, H FROM MYTABLE
WHERE D IN ? AND A = ? AND B = ? AND C = ? AND ID > ?
ORDER BY ID LIMIT 100
然后我只需要:
CREATE INDEX INDEX01 ON MYTABLE (D, A, B, C)
这是对的吗?
但是,我认为以这种方式排序WHERE子句并不是最佳选择。总是试图把'" IN"操作和">"作为最后2个WHERE子句的操作是:
MySQL需要为" IN"做更多的工作。 (与多个值比较)与" ="相比,并且可能(由于我的数据集和我过滤的内容),此子句将修剪更少的行。
< / LI>&#34;&gt;&#34;操作主要是为了分页目的。即在某些情况下,由于这一条款,几乎没有修剪。
我的理解是否正确?
答案 0 :(得分:1)
不在同一查询中同时执行DISTINCT
和GROUP BY
。由于汇总(COUNT
),您可能需要GROUP BY
,因此请DISTINCT
。
对于GROUP BY x ORDER BY x LIMIT 100
,以下可以帮助:
INDEX(x) -- or INDEX(x, ...)
所以,包括这个,以防万一。我的意思是,优化器可能选择使用索引来处理GROUP BY + ORDER BY + LIMIT
而不是查看WHERE
。如果它决定使用WHERE
,那么......
WHERE D IN ? AND A > ?
INDEX(D, A)
可以跳过(“MRR”)D并扫描A,但不能消耗任何GROUP BY
或ORDER BY
。
WHERE A = ? AND D IN ? AND B > ?
INDEX(A, D, B)
在索引中放置任何'='的东西。其余的逻辑如上所述。
WHERE A = ? AND B = ? AND D IN ? AND C > ?
INDEX(A, B, D, C) or INDEX(B, A, D, C)
(相同的逻辑)
WHERE A = ? AND B = ? AND C = ? AND D IN ? AND ID > ?
INDEX(A,B,C, -- in any order, then
D, ID) -- at end, in this order.
因此,对于那组4个语句,我建议按给定的顺序使用4或5个索引:
INDEX(D, A)
INDEX(A, D, B)
INDEX(B, A, D, C) -- I picked that one to get one starting with B
INDEX(c, B, A, D, ID)
INDEX(ID) -- but don't add if you already have `PRIMARY KEY(ID)`
作为奖励,在这些索引中,前3个SELECTs
具有“覆盖”索引,这为您提供额外奖励。最后SELECT
需要一个9列索引来“覆盖”;那太多了。
WHERE
中AND'd的顺序没有任何区别。所以,我想我可以忽略你的其余问题。
(警告:在大约5.6之前,跳跃不存在,所以“最佳”指标集将是别的。)