mysql查询优化器使用分区的子查询比使用索引的左连接更好的性能

时间:2015-06-28 15:07:48

标签: mysql sql indexing partitioning myisam

在ubuntu 12.04 LTS上使用mysql版本5.6.14-enterprise-commercial-advanced-log时,从这些表中查询数据时遇到以下行为:

CREATE TABLE `a` (
          `id` varchar(32) DEFAULT NULL,
          `request_time` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
        ) ENGINE=MyISAM DEFAULT CHARSET=latin1
        !50100 PARTITION BY RANGE (UNIX_TIMESTAMP(request_time))
        (PARTITION p15062116 VALUES LESS THAN (1434895200) ENGINE = MyISAM,
         PARTITION p15062117 VALUES LESS THAN (1434898800) ENGINE = MyISAM,
         PARTITION p15062118 VALUES LESS THAN (1434902400) ENGINE = MyISAM,
        ...
        PARTITION rest VALUES LESS THAN MAXVALUE ENGINE = MyISAM)

CREATE TABLE `b` (
          `id` varchar(50) NOT NULL,
          `start_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
          `item` int(11) DEFAULT NULL,
          `item2` int(11) DEFAULT NULL,
          PRIMARY KEY (`id`,`start_time`)
        ) ENGINE=MyISAM DEFAULT CHARSET=utf8
        !50100 PARTITION BY RANGE (UNIX_TIMESTAMP(start_time))
        (PARTITION p15062516 VALUES LESS THAN (1435240800) ENGINE = MyISAM,
         PARTITION p15062517 VALUES LESS THAN (1435244400) ENGINE = MyISAM
        ....
        PARTITION rest VALUES LESS THAN MAXVALUE ENGINE = MyISAM)

使用此查询会产生1秒的运行时间:

    SELECT SQL_NO_CACHE request_time, item  
    FROM a left join
      (select * 
       from b 
       where start_time between '2015-06-28 10:00:00' and '2015-06-28 11:00:00'
      ) c using(id) 
where request_time between '2015-06-28 10:00:00' and '2015-06-28 11:00:00'

解释输出:

+----+-------------+------------+------+---------------+-------------+---------+------+--------+-------------+
| id | select_type | table      | type | possible_keys | key         | key_len | ref  | rows   | Extra       |
+----+-------------+------------+------+---------------+-------------+---------+------+--------+-------------+
|  1 | PRIMARY     | a   | ALL  | NULL          | NULL        | NULL    | NULL | 336972 | Using where |
|  1 | PRIMARY     | <derived2> | ref  | <auto_key0>   | <auto_key0> | 152     | func |     10 | Using where |
|  2 | DERIVED     | b  | ALL  | NULL          | NULL        | NULL    | NULL |  39508 | Using where |
+----+-------------+------------+------+---------------+-------------+---------+------+--------+-------------+
3 rows in set (0.00 sec)

mysql> explain partitions SELECT SQL_NO_CACHE request_time, item  FROM a  left join (select * from b where start_time between '2015-06-28 10:00:00' and '2015-06-28 11:00:00') b using(id) where request_time between '2015-06-28 10:00:
+----+-------------+------------+---------------------+------+---------------+-------------+---------+------+--------+-------------+
| id | select_type | table      | partitions          | type | possible_keys | key         | key_len | ref  | rows   | Extra       |
+----+-------------+------------+---------------------+------+---------------+-------------+---------+------+--------+-------------+
|  1 | PRIMARY     | a   | p15062810,p15062811 | ALL  | NULL          | NULL        | NULL    | NULL | 336972 | Using where |
|  1 | PRIMARY     | <derived2> | NULL                | ref  | <auto_key0>   | <auto_key0> | 152     | func |     10 | Using where |
|  2 | DERIVED     | b  | p15062810,p15062811 | ALL  | NULL          | NULL        | NULL    | NULL |  39508 | Using where |
+----+-------------+------------+---------------------+------+---------------+-------------+---------+------+--------+-------------+

并使用此查询导致30秒运行时:

SELECT SQL_NO_CACHE request_time, item
FROM a
left join b using(id)
where request_time between '2015-06-28 10:00:00' and '2015-06-28 11:00:00' 
and (start_time between '2015-06-28 10:00:00' and '2015-06-28 11:00:00' or start time is null) ;

解释输出:

+----+-------------+-----------+------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type | table     | type | possible_keys | key     | key_len | ref  | rows   | Extra                    |
+----+-------------+-----------+------+---------------+---------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | a  | ALL  | NULL          | NULL    | NULL    | NULL | 336972 | Using where              |
|  1 | SIMPLE      | b | ref  | PRIMARY       | PRIMARY | 152     | func |    395 | Using where; Using index |
+----+-------------+-----------+------+---------------+---------+---------+------+--------+--------------------------+
mysql> explain partitions SELECT SQL_NO_CACHE request_time, item  FROM a  left join b using(id) where request_time between '2015-06-28 10:00:00' and '2015-06-28 11:00:00' and start_time between '2015-06-28 10:00:00' and '2015-06-28 11:
+----+-------------+-----------+---------------------+------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type | table     | partitions          | type | possible_keys | key     | key_len | ref  | rows   | Extra                    |
+----+-------------+-----------+---------------------+------+---------------+---------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | a  | p15062810,p15062811 | ALL  | NULL          | NULL    | NULL    | NULL | 336972 | Using where              |
|  1 | SIMPLE      | b | p15062810,p15062811 | ref  | PRIMARY       | PRIMARY | 152     | func |    395 | Using where; Using index |
+----+-------------+-----------+---------------------+------+---------------+---------+---------+------+--------+--------------------------+

我期望基于id索引和1小时分区的使用来从第二个查询获得类似或更好的结果。 两个表都有1000000条记录。

你能解释为什么第一个查询比第二个查询更有效吗?

我们可以重构第一个查询,以便它可以成为视图或可重用查询,而不是为每个连接重建子查询吗?

感谢

1 个答案:

答案 0 :(得分:0)

我的猜测是SQL引擎很难确定第二个查询的分区。您可以尝试将其写为:

SELECT SQL_NO_CACHE count(*)
FROM a left join
     b
     using(id)
where request_time between '2015-06-28 10:00:00' and '2015-06-28 11:00:00' and
      start_time between '2015-06-28 10:00:00' and  '2015-06-28 11:00:00'
UNION ALL
SELECT SQL_NO_CACHE count(*)
FROM a left join
     b
     using (id)
WHERE request_time between '2015-06-28 10:00:00' and '2015-06-28 11:00:00' and
      start time is null ;

我意识到这会返回两行。如果这有助于引擎找到正确的分区,那么将这些值一起添加就很容易了。