以下查询在user_chars(约20毫米记录)和user_data(约10毫米记录)上运行。查询运行得太慢,我想知道更好的复合索引是否可以改善这种情况。
关于什么是最佳综合指数的想法?
SELECT username, title, status
FROM (
SELECT username, title, status
FROM user_chars w, user_data r
WHERE w.user_id = r.user_id
AND (status < '300' OR is_admin = '1')
AND (
(rating_id = 'rating1' AND rating BETWEEN 55 AND 65)
OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)
OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)
OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)
...
)
GROUP BY w.user_id
HAVING COUNT(*) >= 3
) data
WHERE username != '0'
AND title != '0'
以下是表格:
CREATE TABLE user_data (
user_id int(10) unsigned NOT NULL AUTO_INCREMENT,
username decimal(17,14) DEFAULT NULL,
title decimal(17,14) DEFAULT NULL,
status smallint(6) unsigned NOT NULL,
is_admin tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (user_id),
KEY username (username),
KEY title (title),
KEY status (status),
KEY is_admin (is_admin),
KEY chars_avg_index (user_id,username,title,status),
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE user_chars (
user_id int(10) unsigned NOT NULL,
rating_id char(32) DEFAULT NULL,
rating tinyint(3) unsigned NOT NULL,
PRIMARY KEY (user_id),
KEY rating_id (rating_id),
KEY rating (rating),
KEY chars_index (user_id,rating_id,rating)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
编辑:添加了EXPLAIN
+----+-------------+------------+--------+--------------------------------------------+-----------------+---------+-----------+-------+-----------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+--------------------------------------------+-----------------+---------+-----------+-------+-----------------------------------------------------------+ | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3668 | Using where | | 2 | DERIVED | w | range | user_id,rating_id,rating,chars_index | chars_index | 98 | NULL | 13215 | Using where; Using index; Using temporary; Using filesort | | 2 | DERIVED | r | eq_ref | PRIMARY,status,is_admin,chars_avg_index | PRIMARY | 4 | w.user_id | 1 | Using where | +----+-------------+------------+--------+--------------------------------------------+-----------------+---------+-----------+-------+-----------------------------------------------------------+
答案 0 :(得分:2)
当我查看此查询的EXPLAIN
输出时,看起来MySQL在与WHERE
进行联接之前将内部查询的user_chars
子句应用于user_data
}。因此,在(rating_id, rating)
中添加user_id
(不包含user_chars
)的索引应该有助于内部查询的WHERE
子句:
ALTER TABLE user_chars ADD INDEX (rating_id, rating);
编辑:此行为取决于每个表中的行数,因此发布EXPLAIN
输出会很有帮助:]
Edit2:我还会按如下方式重写查询:
SELECT username, title, status
FROM user_chars w, user_data r
WHERE w.user_id = r.user_id
AND (status < '300' OR is_admin = '1')
AND (
(rating_id = 'rating1' AND rating BETWEEN 55 AND 65)
OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)
OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)
OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)
...
)
AND username != '0'
AND title != '0'
GROUP BY w.user_id
HAVING COUNT(*) >= 3
答案 1 :(得分:1)
这是一个有趣的执行计划。我担心我无法提供任何特别具体的建议,主要是因为我没有设法提出任何简单的测试数据来说服我的MySQL服务器使用相同的计划。
我确实有一些随意的建议:
您不需要嵌套查询 - 您可以使用HAVING COUNT(*) >= 3 AND username != '0' AND title != '0'
获得相同的效果。或者您可以尝试将username
和title
条件移到内部WHERE
子句中。
我的测试表明MySQL不够智能,无法对status < '300' OR is_admin = '1'
条件使用index merge和/或范围优化,即使我在(is_admin, status)
上创建索引。创建一个编码这两个值的单个列可能是一个好主意,最好是只需要对它进行单一范围比较。
您可能还会考虑删除不所需的任何索引,除非其他查询需要它们。未使用的索引只会占用空间,减慢INSERT
的速度并使查询计划程序混淆。
如果您最近没有这样做,请在表格上运行ANALYZE TABLE
,看看执行计划是否发生变化。
答案 2 :(得分:0)
user_data
表的当前结构不幸地阻止了对任何索引的有效使用。
基本上,从user_data
获取的数据的整体条件如下:
WHERE username != '0' AND title != '0' AND (status < '300' OR is_admin = '1')
应在聚合之前应用条件,否则聚合将处理多余的数据。
当您搜索与其他东西相等且条件与AND连接的任何内容时,索引可以发挥最佳效果,您的情况正好相反。 因此,为了优化查询,您可以引入一些非规范化列,它可以以某种方式存储(username!='0'AND title!='0'AND(status&lt;'300'或is_admin ='1'))的结果并被索引。到那时,我们将继续我们所拥有的。
您将结果与user_chars
一起加入,其中包含多个OR,但所有这些操作都在rating_id和rating上运行。因为,评级列更具选择性(具有更多不同的值),所以最好将列放在复合索引(rating,rating_id)的左侧。拥有索引你不再需要(评级)和(rating_id,评级)的索引,只需删除它们。
现在,我不确定MySQL是否可以自行进行优化,因此您需要比较以下查询的执行情况:
SELECT user_id
FROM user_data JOIN user_chars USING (user_id)
WHERE username != '0' AND title != '0' AND (status < '300' OR is_admin = '1')
AND (
(rating_id = 'rating1' AND rating BETWEEN 55 AND 65)
OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)
OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)
OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)
)
GROUP BY user_id
HAVING COUNT(*) > 3
和第二个:
SELECT user_id
FROM user_data JOIN user_chars USING (user_id)
WHERE username != '0' AND title != '0' AND (status < '300' OR is_admin = '1')
AND rating_id in ('rating1', 'rating2', 'rating3', 'rating4')
AND rating BETWEEN 55 AND 100 -- adjust the lines according to ... in your query
AND (
(rating_id = 'rating1' AND rating BETWEEN 55 AND 65)
OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)
OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)
OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)
)
GROUP BY user_id
HAVING COUNT(*) > 3
后一个查询可能执行得更快,因为它包含使用我们的索引的显式提示。此外,两个查询都只选择user_ids而不是在聚合期间浪费内存。现在,您可以将最快查询的结果加入user_data
表:
SELECT username, title, status
FROM (
SELECT user_id
FROM user_data JOIN user_chars USING (user_id)
WHERE username != '0' AND title != '0' AND (status < '300' OR is_admin = '1')
AND rating_id in ('rating1', 'rating2', 'rating3', 'rating4')
AND rating BETWEEN 55 AND 100
AND (
(rating_id = 'rating1' AND rating BETWEEN 55 AND 65)
OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)
OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)
OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)
)
GROUP BY user_id
HAVING COUNT(*) > 3
) as user_ids JOIN user_data USING (user_id);