我遇到了MySQL查询性能问题。
表(InnoDB):
+--------------------+---------------------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+---------------------+------+-----+-------------------+-------+
| st_resource_id | varchar(32) | NO | MUL | NULL | |
| st_sub_resource_id | varchar(32) | YES | | NULL | |
| st_title | varchar(500) | YES | | NULL | |
| st_resource_type | varchar(100) | NO | MUL | NULL | |
| st_site_id | tinyint(4) | NO | MUL | NULL | |
| st_time | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
| st_user_id | int(10) unsigned | YES | | NULL | |
| st_full_access | tinyint(1) unsigned | YES | | NULL | |
+--------------------+---------------------+------+-----+-------------------+-------+
索引:
+---------------+------------+------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------------+------------+------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+
| nr_statistics | 1 | resource_id | 1 | st_resource_id | A | 1546165 | NULL | NULL | | BTREE | |
| nr_statistics | 1 | resource_id | 2 | st_sub_resource_id | A | 1546165 | NULL | NULL | YES | BTREE | |
| nr_statistics | 1 | st_time | 1 | st_time | A | 1546165 | NULL | NULL | | BTREE | |
| nr_statistics | 1 | st_site_id | 1 | st_site_id | A | 16 | NULL | NULL | | BTREE | |
| nr_statistics | 1 | st_resource_type | 1 | st_resource_type | A | 16 | 10 | NULL | | BTREE | |
+---------------+------------+------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+
查询:
SELECT st_resource_id AS docId, count(*) AS cnt
FROM nr_statistics
WHERE
st_resource_type = 'document'
AND st_sub_resource_id = 'text'
AND st_time > DATE_SUB(NOW(), INTERVAL 7 DAY)
AND st_site_id = 1
GROUP BY st_resource_id
ORDER BY cnt DESC
LIMIT 0, 5;
查询计划:
+----+-------------+---------------+-------+-------------------------------------+-------------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+-------------------------------------+-------------+---------+------+---------+----------------------------------------------+
| 1 | SIMPLE | nr_statistics | index | st_time,st_site_id,st_resource_type | resource_id | 197 | NULL | 1581044 | Using where; Using temporary; Using filesort |
+----+-------------+---------------+-------+-------------------------------------+-------------+---------+------+---------+----------------------------------------------+
表有~1,666,383行。查询运行速度非常慢。在MySQL进程列表中,我在“复制到tmp表阶段”中长时间(> 1分钟)看到此查询。查询会产生大量I / O负载。我无法理解如何解决问题并加快查询执行速度。
如果问题是索引错误的结果,那么哪些索引是正确的?
UPD。我创建了新的综合索引:
| nr_statistics | 1 | st_site_id_2 | 1 | st_site_id | A | 16 | NULL | NULL | | BTREE | |
| nr_statistics | 1 | st_site_id_2 | 2 | st_resource_type | A | 16 | NULL | NULL | | BTREE | |
| nr_statistics | 1 | st_site_id_2 | 3 | st_sub_resource_id | A | 752018 | NULL | NULL | YES | BTREE | |
| nr_statistics | 1 | st_site_id_2 | 4 | st_time | A | 1504037 | NULL | NULL | | BTREE | |
| nr_statistics | 1 | st_site_id_2 | 5 | st_resource_id | A | 1504037 | NULL | NULL | | BTREE | |
现在查询计划是:
+----+-------------+---------------+-------+---------------+--------------+---------+------+-------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+--------------+---------+------+-------+-----------------------------------------------------------+
| 1 | SIMPLE | nr_statistics | range | st_site_id_2 | st_site_id_2 | 406 | NULL | 21168 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+---------------+-------+---------------+--------------+---------+------+-------+-----------------------------------------------------------+
查询现在运行得非常快(为0.0x秒),但我必须强制使用新索引:
SELECT st_resource_id as docId, count( * ) AS Cnt
FROM nr_statistics
USE INDEX (st_site_id_2)
WHERE st_resource_type = 'document'
AND st_sub_resource_id = 'text'
AND st_time > DATE_SUB( NOW( ) , INTERVAL 7 DAY )
AND st_site_id = 1
GROUP BY st_resource_id
ORDER BY cnt DESC
LIMIT 0 , 5;
虽然问题已经解决(不是很漂亮但有效),但我仍有一些悬而未决的问题(见评论)。
答案 0 :(得分:2)
在(st_site_id, st_resource_type, st_sub_resourse_id, st_time, st_resource_id)
上创建综合索引。
但是,您仍然会在计划中使用temporary
和filesort
,因为您在COUNT(*)
上订购了不可索引的内容。
如果您需要快速且经常地运行此查询,则必须创建一个聚合表,该表将存储每个站点/资源/子课程/周组合的计数,并在触发器中更新它。
答案 1 :(得分:1)
您是否尝试在st_resource_type, st_resource_id, st_time and st_site_id
上创建综合索引?它看起来像你有几个索引,但大多数是在一列,或可能是2列。通过使用您需要的更多列的复合索引,可以提高性能。
答案 2 :(得分:0)
使用多个where子句进行查询时,编写它们的顺序应与编写查询的顺序相匹配。
在您的特定情况下,它将是:
CREATE INDEX stats_index ON nr_statistics (st_resource_type, st_sub_resource_id, st_time, st_site_id);
这应该会给你一个非常好的速度提升。