Mysql Group按时间间隔优化

时间:2015-08-16 18:21:13

标签: mysql sql database optimization indexing

我有一个非常大的表(数亿行),它将测试结果以及日期时间和外键存储到名为“链接”的相关实体中,我需要按时间对行进行分组间隔10,15,20,30和60分钟以及按时间过滤和' link_id'我知道这可以通过这个查询完成,如[here] [1]所述:

SELECT time,AVG(RTT),MIN(RTT),MAX(RTT),COUNT(*) FROM  trace
WHERE link_id=1 AND time>='2015-01-01' AND time <= '2015-01-30'
GROUP BY UNIX_TIMESTAMP(time) DIV 600;

此解决方案有效,但速度非常慢(平均约为10),因此我尝试按时间间隔为每个组添加日期时间列&#39;例如行:

id | time                     | rtt        | link_id
1  | 2014-01-01 12:34:55.4034 | 154.3      | 2

成为:

id | time                     | rtt        | link_id | time_60                   |time_30 ...
1  | 2014-01-01 12:34:55.4034 | 154.3      | 2       | 2014-01-01 12:00:00.00    | 2014-01-01 12:30:00.00 ...

我得到以下查询的间隔:

SELECT time_10,AVG(RTT),MIN(RTT),MAX(RTT),COUNT(*) FROM  trace
WHERE link_id=1 AND time>='2015-01-01' AND time <= '2015-01-30'
GROUP BY time_10;

此查询至少快50%(平均约5秒),但仍然相当慢,我怎样才能更快地优化此查询?

解释查询输出:

+----+-------------+------------+------+------------------------------------------------------------------------+----------------------------------------------------+---------+-------+---------+----------------------------------------------+
| id | select_type | table      | type | possible_keys                                                          | key                                                | key_len | ref   | rows    | Extra                                        |
+----+-------------+------------+------+------------------------------------------------------------------------+----------------------------------------------------+---------+-------+---------+----------------------------------------------+
|  1 | SIMPLE      | main_trace | ref  | main_trace_link_id_c6febb11f84677f_fk_main_link_id,main_trace_e7549e3e | main_trace_link_id_c6febb11f84677f_fk_main_link_id | 4       | const | 1478359 | Using where; Using temporary; Using filesort |
+----+-------------+------------+------+------------------------------------------------------------------------+----------------------------------------------------+---------+-------+---------+----------------------------------------------+

这些是表索引:

+------------+------------+----------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table      | Non_unique | Key_name                                           | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+----------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| main_trace |          0 | PRIMARY                                            |            1 | id          | A         |     2956718 |     NULL | NULL   |      | BTREE      |         |               |
| main_trace |          1 | main_trace_link_id_c6febb11f84677f_fk_main_link_id |            1 | link_id     | A         |           2 |     NULL | NULL   |      | BTREE      |         |               |
| main_trace |          1 | main_trace_07cc694b                                |            1 | time        | A         |     2956718 |     NULL | NULL   |      | BTREE      |         |               |
| main_trace |          1 | main_trace_e7549e3e                                |            1 | time_10     | A         |       22230 |     NULL | NULL   | YES  | BTREE      |         |               |
| main_trace |          1 | main_trace_01af8333                                |            1 | time_15     | A         |       14783 |     NULL | NULL   | YES  | BTREE      |         |               |
| main_trace |          1 | main_trace_1681ff94                                |            1 | time_20     | A         |       10870 |     NULL | NULL   | YES  | BTREE      |         |               |
| main_trace |          1 | main_trace_f7c28c93                                |            1 | time_30     | A         |        6399 |     NULL | NULL   | YES  | BTREE      |         |               |
| main_trace |          1 | main_trace_0f29fcc5                                |            1 | time_60     | A         |        3390 |     NULL | NULL   | YES  | BTREE      |         |               |
+------------+------------+----------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

3 个答案:

答案 0 :(得分:1)

对于此查询:

SELECT time_10, AVG(RTT), MIN(RTT), MAX(RTT), COUNT(*)
FROM  trace
WHERE link_id = 1 AND time >= '2015-01-01' AND time <= '2015-01-30'
GROUP BY time_10;

最佳指数是覆盖指数:trace(link_id, time, time_10, rtt)

答案 1 :(得分:1)

(id,time)后面跟一个潜在的analyze table trace的复合索引会使它变得活泼。

这只是一个建议,我不是说这样做。分析表可能需要一些人来运行数百万行。

建议仅基于一个查询创建索引并不是一个好主意。假设是,您有其他疑问。而且它们会拖累插入/更新。

答案 2 :(得分:1)

time <= '2015-01-30'排除了1月最后一天的大部分时间;你想要那个吗?这种模式运作良好,避免了许多终端(例如,leapyear):

WHERE time >= '2015-01-01'
  AND time  < '2015-01-01' + INTERVAL 1 MONTH

如果这是静态数据(例如一次性写入数据仓库),则可以通过构建和维护Summary Tables来使查询更多更快。