我有一个非常大的表(数亿行),它将测试结果以及日期时间和外键存储到名为“链接”的相关实体中,我需要按时间对行进行分组间隔10,15,20,30和60分钟以及按时间过滤和' link_id'我知道这可以通过这个查询完成,如[here] [1]所述:
SELECT time,AVG(RTT),MIN(RTT),MAX(RTT),COUNT(*) FROM trace
WHERE link_id=1 AND time>='2015-01-01' AND time <= '2015-01-30'
GROUP BY UNIX_TIMESTAMP(time) DIV 600;
此解决方案有效,但速度非常慢(平均约为10),因此我尝试按时间间隔为每个组添加日期时间列&#39;例如行:
id | time | rtt | link_id
1 | 2014-01-01 12:34:55.4034 | 154.3 | 2
成为:
id | time | rtt | link_id | time_60 |time_30 ...
1 | 2014-01-01 12:34:55.4034 | 154.3 | 2 | 2014-01-01 12:00:00.00 | 2014-01-01 12:30:00.00 ...
我得到以下查询的间隔:
SELECT time_10,AVG(RTT),MIN(RTT),MAX(RTT),COUNT(*) FROM trace
WHERE link_id=1 AND time>='2015-01-01' AND time <= '2015-01-30'
GROUP BY time_10;
此查询至少快50%(平均约5秒),但仍然相当慢,我怎样才能更快地优化此查询?
解释查询输出:
+----+-------------+------------+------+------------------------------------------------------------------------+----------------------------------------------------+---------+-------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+------------------------------------------------------------------------+----------------------------------------------------+---------+-------+---------+----------------------------------------------+
| 1 | SIMPLE | main_trace | ref | main_trace_link_id_c6febb11f84677f_fk_main_link_id,main_trace_e7549e3e | main_trace_link_id_c6febb11f84677f_fk_main_link_id | 4 | const | 1478359 | Using where; Using temporary; Using filesort |
+----+-------------+------------+------+------------------------------------------------------------------------+----------------------------------------------------+---------+-------+---------+----------------------------------------------+
这些是表索引:
+------------+------------+----------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+----------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| main_trace | 0 | PRIMARY | 1 | id | A | 2956718 | NULL | NULL | | BTREE | | |
| main_trace | 1 | main_trace_link_id_c6febb11f84677f_fk_main_link_id | 1 | link_id | A | 2 | NULL | NULL | | BTREE | | |
| main_trace | 1 | main_trace_07cc694b | 1 | time | A | 2956718 | NULL | NULL | | BTREE | | |
| main_trace | 1 | main_trace_e7549e3e | 1 | time_10 | A | 22230 | NULL | NULL | YES | BTREE | | |
| main_trace | 1 | main_trace_01af8333 | 1 | time_15 | A | 14783 | NULL | NULL | YES | BTREE | | |
| main_trace | 1 | main_trace_1681ff94 | 1 | time_20 | A | 10870 | NULL | NULL | YES | BTREE | | |
| main_trace | 1 | main_trace_f7c28c93 | 1 | time_30 | A | 6399 | NULL | NULL | YES | BTREE | | |
| main_trace | 1 | main_trace_0f29fcc5 | 1 | time_60 | A | 3390 | NULL | NULL | YES | BTREE | | |
+------------+------------+----------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
答案 0 :(得分:1)
对于此查询:
SELECT time_10, AVG(RTT), MIN(RTT), MAX(RTT), COUNT(*)
FROM trace
WHERE link_id = 1 AND time >= '2015-01-01' AND time <= '2015-01-30'
GROUP BY time_10;
最佳指数是覆盖指数:trace(link_id, time, time_10, rtt)
。
答案 1 :(得分:1)
(id,time)后面跟一个潜在的analyze table trace
的复合索引会使它变得活泼。
这只是一个建议,我不是说这样做。分析表可能需要一些人来运行数百万行。
建议仅基于一个查询创建索引并不是一个好主意。假设是,您有其他疑问。而且它们会拖累插入/更新。
答案 2 :(得分:1)
time <= '2015-01-30'
排除了1月最后一天的大部分时间;你想要那个吗?这种模式运作良好,避免了许多终端(例如,leapyear):
WHERE time >= '2015-01-01'
AND time < '2015-01-01' + INTERVAL 1 MONTH
如果这是静态数据(例如一次性写入数据仓库),则可以通过构建和维护Summary Tables来使查询更多更快。