使用group by优化mysql查询

时间:2019-05-11 08:20:09

标签: mysql sql database query-optimization innodb

我有一个包含11列和大约500万条记录的InnoDB表,在其中我使用查询来查找具有最高总和的前10条记录。表模式如下。

id (int 11) (primary key)
activity_id(varchar 250)
activity_type (varchar 10)
advertised_time (timestamp)
advertised_train_ident(int 11)
technical_train_ident(int 11)
location_signature(varchar 10)
time_at_location(timestamp)
information_owner(varchar 100)
created_at(timestamp)
updated_at(timestamp)

表中存在的索引是

id - primary key
location_signature,activity_type, advertised_time - composite index (name is search)

我正在使用以下查询从上表中提取记录,完成执行需要10到12秒的时间。

SELECT location_signature, activity_type,  
SUM(CASE WHEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) > 0 THEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) else 0 END) as delay_time, 
count(id) as total_train_count, 
SUM(CASE WHEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) > 0 THEN 1 ELSE 0 END) as delayed_train_count 
from `train_announcements` 
where `advertised_time` >= '2019-04-01 10:00:00' and `advertised_time` <= '2019-04-30 10:00:00' 
group by `location_signature`, `activity_type` 
order by `delay_time` desc 
limit 10 offset 0;

此查询的Explain语句如下

+----+-------------+----------------------------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
| id | select_type | table                      | type  | possible_keys | key     | key_len | ref  | rows   | Extra                                        |
+----+-------------+----------------------------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
|  1 | SIMPLE      | train_announcements        | index | search        | search  | 84      | NULL | 4910024| Using where; Using temporary; Using filesort |
+----+-------------+----------------------------+-------+---------------+---------+---------+------+--------+----------------------------------------------+

请注意,由于字段location_signature包含特殊字符,因此该表的排序规则为utf8mb4_unicode_ci

如果有人可以提出任何解决方法来提高此查询的性能,那将是很好的。预先感谢。

2 个答案:

答案 0 :(得分:3)

查看索引,确保您的advertised_time位于左上方

并且可能对添加time_at_location敌人很有用,以避免访问数据表并使用索引中的数据

表train_announcements的索引

列(广告时间,位置签名,活动类型,时间所在位置)

SELECT location_signature
  , activity_type
  , SUM(CASE WHEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) > 0 
            THEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) 
            ELSE 0 END) as delay_time
  , count(id) as total_train_count
  , SUM(CASE WHEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) > 0 
            THEN 1 
            ELSE 0 END) as delayed_train_count 
from `train_announcements` 
where `advertised_time` BETWEEN '2019-04-01 10:00:00' and '2019-04-30 10:00:00' 
group by `location_signature`, `activity_type` 
order by `delay_time` desc 
limit 10 offset 0;

,如果您没有id为null的值,请尝试使用count(*)代替count(id)

SELECT location_signature
  , activity_type
  , SUM(CASE WHEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) > 0 
            THEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) 
            ELSE 0 END) as delay_time
  , count(*) as total_train_count
  , SUM(CASE WHEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) > 0 
            THEN 1 
            ELSE 0 END) as delayed_train_count 
from `train_announcements` 
where `advertised_time` BETWEEN '2019-04-01 10:00:00' and '2019-04-30 10:00:00' 
group by `location_signature`, `activity_type` 
order by `delay_time` desc 
limit 10 offset 0;

或者如果您确实还需要ID,请尝试将此列添加到复合索引

      (advertised_time, location_signature, activity_type, time_at_location, id )

答案 1 :(得分:0)

建立并维护摘要表。例如,每天都有小计。然后,“报告”将针对这个小得多的表,因此会更快。

更多:http://mysql.rjweb.org/doc.php/summarytables