Question

我有一个包含11列和大约500万条记录的InnoDB表，在其中我使用查询来查找具有最高总和的前10条记录。表模式如下。

id (int 11) (primary key)
activity_id(varchar 250)
activity_type (varchar 10)
advertised_time (timestamp)
advertised_train_ident(int 11)
technical_train_ident(int 11)
location_signature(varchar 10)
time_at_location(timestamp)
information_owner(varchar 100)
created_at(timestamp)
updated_at(timestamp)

表中存在的索引是

id - primary key
location_signature,activity_type, advertised_time - composite index (name is search)

我正在使用以下查询从上表中提取记录，完成执行需要10到12秒的时间。

SELECT location_signature, activity_type,  
SUM(CASE WHEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) > 0 THEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) else 0 END) as delay_time, 
count(id) as total_train_count, 
SUM(CASE WHEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) > 0 THEN 1 ELSE 0 END) as delayed_train_count 
from `train_announcements` 
where `advertised_time` >= '2019-04-01 10:00:00' and `advertised_time` <= '2019-04-30 10:00:00' 
group by `location_signature`, `activity_type` 
order by `delay_time` desc 
limit 10 offset 0;

此查询的Explain语句如下

+----+-------------+----------------------------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
| id | select_type | table                      | type  | possible_keys | key     | key_len | ref  | rows   | Extra                                        |
+----+-------------+----------------------------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
|  1 | SIMPLE      | train_announcements        | index | search        | search  | 84      | NULL | 4910024| Using where; Using temporary; Using filesort |
+----+-------------+----------------------------+-------+---------------+---------+---------+------+--------+----------------------------------------------+

请注意，由于字段location_signature包含特殊字符，因此该表的排序规则为utf8mb4_unicode_ci。

如果有人可以提出任何解决方法来提高此查询的性能，那将是很好的。预先感谢。

Answer 1

查看索引，确保您的advertised_time位于左上方

并且可能对添加time_at_location敌人很有用，以避免访问数据表并使用索引中的数据

表train_announcements的索引

列（广告时间，位置签名，活动类型，时间所在位置）

SELECT location_signature
  , activity_type
  , SUM(CASE WHEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) > 0 
            THEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) 
            ELSE 0 END) as delay_time
  , count(id) as total_train_count
  , SUM(CASE WHEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) > 0 
            THEN 1 
            ELSE 0 END) as delayed_train_count 
from `train_announcements` 
where `advertised_time` BETWEEN '2019-04-01 10:00:00' and '2019-04-30 10:00:00' 
group by `location_signature`, `activity_type` 
order by `delay_time` desc 
limit 10 offset 0;

，如果您没有id为null的值，请尝试使用count（*）代替count（id）

SELECT location_signature
  , activity_type
  , SUM(CASE WHEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) > 0 
            THEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) 
            ELSE 0 END) as delay_time
  , count(*) as total_train_count
  , SUM(CASE WHEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) > 0 
            THEN 1 
            ELSE 0 END) as delayed_train_count 
from `train_announcements` 
where `advertised_time` BETWEEN '2019-04-01 10:00:00' and '2019-04-30 10:00:00' 
group by `location_signature`, `activity_type` 
order by `delay_time` desc 
limit 10 offset 0;

或者如果您确实还需要ID，请尝试将此列添加到复合索引

      (advertised_time, location_signature, activity_type, time_at_location, id )

Answer 2

建立并维护摘要表。例如，每天都有小计。然后，“报告”将针对这个小得多的表，因此会更快。

更多：http://mysql.rjweb.org/doc.php/summarytables

使用group by优化mysql查询

2 个答案: