为什么同一个查询可能需要1000倍的长度,具体取决于where子句?

时间:2014-10-18 10:47:19

标签: mysql sqlperformance gtfs

我有一个相当大的查询,它使用连接在几个表中收集大量信息。该数据库是来自城市公共交通系统的GTFS公交信息。

我使用不同的WHERE子句运行相同的查询,所花费的时间可以从200毫秒到200秒。

如果您不需要解释,请直接向下滚动到问题

数据库

表格是:

  • routes
  • trips:使用routes
  • route_id相关联
  • stop_times: connected to {跳闸{1}} trip_id`
  • using:使用stops
  • stop_times相关联
  • stop_id:连接两个stop_connections

我的目标是使用2个连接选择旅程。以下是我的查询在纸面上的显示方式:

Query schema written on paper

说明:

  • 黑色信息是表格,每行一种表格类型(即顶行是stop_id表格。)
  • 红色信息是我的查询中的别名(tripssstopsststop_timest,{ {1}}到达,trips是离开,1/2 / 3是行程索引)
  • 绿色信息是每张桌子上的条件列表

基本上它是:

  1. a从给定的d
  2. 开始
  3. [s1d ]从该站点获取行程的出发时间
  4. stop_id将这些旅行限制为我们想要的[st1d]
  5. [t1 ]获取停止的到达时间
  6. route_id获取停止信息(停止名称)
  7. [st1a]将此站点连接到步行距离内的所有其他站点
  8. 重复此操作3次以获得3次旅行(2次连接),并将到达站点过滤到我想要的那个。

    查询

    [s1a ]

    问题

    我运行此查询两次,唯一的区别在于最后的where子句:

    • [cs1 ]:13行(2分58.81秒)
    • select s1d.stop_id as s1d_id, s1d.stop_name as s1d_name, s1d.stop_lat as s1d_lat, s1d.stop_lon as s1d_lon, st1d.departure_time as st1d_dep, t1.trip_id as t1_id, t1.trip_headsign as t1_headsign, t1.route_id as t1_route, t1.direction_id as t1_dir, st1a.departure_time as st1a_dep, s1a.stop_id as s1a_id, s1a.stop_name as s1a_name, s1a.stop_lat as s1a_lat, s1a.stop_lon as s1a_lon, cs1.from_stop_id, cs1.to_stop_id, s2d.stop_id as s2d_id, s2d.stop_name as s2d_name, s2d.stop_lat as s2d_lat, s2d.stop_lon as s2d_lon, st2d.departure_time as st2d_dep, t2.trip_id as t2_id, t2.trip_headsign as t2_headsign, t2.route_id as t2_route, t2.direction_id as t2_dir, st2a.departure_time as st2a_dep, s2a.stop_id as s2a_id, s2a.stop_name as s2a_name, s2a.stop_lat as s2a_lat, s2a.stop_lon as s2a_lon, cs2.from_stop_id, cs2.to_stop_id, s3d.stop_id as s3d_id, s3d.stop_name as s3d_name, s3d.stop_lat as s3d_lat, s3d.stop_lon as s3d_lon, st3d.departure_time as st3d_dep, t3.trip_id as t3_id, t3.trip_headsign as t3_headsign, t3.route_id as t3_route, t3.direction_id as t3_dir, st3a.departure_time as st3a_dep, s3a.stop_id as s3a_id, s3a.stop_name as s3a_name, s3a.stop_lat as s3a_lat, s3a.stop_lon as s3a_lon from stops s1d left join stop_times st1d on st1d.stop_id = s1d.stop_id and st1d.departure_time > '07:33:00' and st1d.departure_time < '08:33:00' left join trips t1 on t1.trip_id = st1d.trip_id and t1.service_id in (select service_id from calendar where start_date <= 20141020 and end_date >= 20141020 and monday = 1) and t1.route_id in ('11-0') left join stop_times st1a on st1a.trip_id = t1.trip_id and st1a.departure_time > st1d.departure_time left join stops s1a on s1a.stop_id = st1a.stop_id left join stop_connections cs1 on cs1.from_stop_id = st1a.stop_id left join stops s2d on s2d.stop_id = cs1.to_stop_id left join stop_times st2d on st2d.stop_id = s2d.stop_id and st2d.departure_time > addtime(st1a.departure_time, '00:03:00') and st2d.departure_time < addtime(st1a.departure_time, '01:03:00') left join trips t2 on t2.trip_id = st2d.trip_id and t2.service_id in (select service_id from calendar where start_date <= 20141020 and end_date >= 20141020 and monday = 1) and t2.route_id in ('3-0', 'NA-0', '4-0', '2-0') left join stop_times st2a on st2a.trip_id = t2.trip_id and st2a.departure_time > st2d.departure_time left join stops s2a on s2a.stop_id = st2a.stop_id left join stop_connections cs2 on cs2.from_stop_id = st2a.stop_id left join stops s3d on s3d.stop_id = cs2.to_stop_id left join stop_times st3d on st3d.stop_id = s3d.stop_id and st3d.departure_time > addtime(st2a.departure_time, '00:03:00') and st3d.departure_time < addtime(st2a.departure_time, '01:03:00') left join trips t3 on t3.trip_id = st3d.trip_id and t3.service_id in (select service_id from calendar where start_date <= 20141020 and end_date >= 20141020 and monday = 1) and t3.route_id in ('36-0', '30-0', '97-0') left join stop_times st3a on st3a.trip_id = t3.trip_id and st3a.departure_time > st3d.departure_time and st3a.stop_id in ('StopPoint:CLBO2', 'StopArea:CLBO', 'StopPoint:CLBO1', 'StopPoint:PLTI2', 'StopPoint:LCBU2', 'StopArea:LCBU', 'StopPoint:LCBU1', 'StopPoint:MHDI2', 'StopPoint:BILE2', 'StopArea:MHDI', 'StopPoint:MHDI1', 'StopPoint:MREZ2', 'StopArea:MRDI', 'StopPoint:MRDI1', 'StopArea:SORI', 'StopPoint:SORI1', 'StopArea:MREZ', 'StopPoint:MREZ1', 'StopPoint:SORI2', 'StopArea:BILE', 'StopPoint:BILE1', 'StopPoint:MRDI2', 'StopArea:PLTI', 'StopPoint:PLTI1', 'StopPoint:SEIL3', 'StopPoint:SEIL2', 'StopArea:SEIL', 'StopPoint:SEIL1') left join stops s3a on s3a.stop_id = st3a.stop_id where s1d.stop_id = 'StopPoint:DEMO1' group by s1d_id, s3a_id having s3a_id is not null order by s1d_id asc, st1d_dep asc, st1a_dep asc, s1a_id asc, s2d_id asc, st2d_dep asc, st2a_dep asc, s2a_id asc, s3d_id asc, st3d_dep asc, st3a_dep asc, s3a_id asc :空集(0.25秒)

    现在这对我来说很奇怪。这是两个查询的解释:

    离开DEMO1(13结果,慢)

    使用where s1d.stop_id = 'StopPoint:DEMO1'

    where s1d.stop_id = 'StopPoint:ECTE2'

    使用EXPLAIN SELECT…

    +----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+-------------+
    | id | select_type | table    | type   | possible_keys                             | key              | key_len | ref                              | rows | Extra       |
    +----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+-------------+
    |  1 | SIMPLE      | s1d      | ALL    | NULL                                      | NULL             | NULL    | NULL                             | 3411 | NULL        |
    |  1 | SIMPLE      | st1d     | ref    | st_stop_id_idx                            | st_stop_id_idx   | 302     | bicou_gtfs_nantes.s1d.stop_id    |  163 | Using where |
    |  1 | SIMPLE      | t1       | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY          | 302     | bicou_gtfs_nantes.st1d.trip_id   |    1 | Using where |
    |  1 | SIMPLE      | calendar | eq_ref | PRIMARY,service_id                        | PRIMARY          | 302     | bicou_gtfs_nantes.t1.service_id  |    1 | Using where |
    |  1 | SIMPLE      | st1a     | ref    | st_trip_id_idx                            | st_trip_id_idx   | 302     | bicou_gtfs_nantes.t1.trip_id     |   14 | Using where |
    |  1 | SIMPLE      | s1a      | eq_ref | PRIMARY                                   | PRIMARY          | 302     | bicou_gtfs_nantes.st1a.stop_id   |    1 | NULL        |
    |  1 | SIMPLE      | cs1      | ref    | from_to_stop_ids                          | from_to_stop_ids | 302     | bicou_gtfs_nantes.st1a.stop_id   |    1 | Using index |
    |  1 | SIMPLE      | s2d      | eq_ref | PRIMARY                                   | PRIMARY          | 302     | bicou_gtfs_nantes.cs1.to_stop_id |    1 | NULL        |
    |  1 | SIMPLE      | st2d     | ref    | st_stop_id_idx                            | st_stop_id_idx   | 302     | bicou_gtfs_nantes.s2d.stop_id    |  163 | Using where |
    |  1 | SIMPLE      | t2       | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY          | 302     | bicou_gtfs_nantes.st2d.trip_id   |    1 | Using where |
    |  1 | SIMPLE      | calendar | eq_ref | PRIMARY,service_id                        | PRIMARY          | 302     | bicou_gtfs_nantes.t2.service_id  |    1 | Using where |
    |  1 | SIMPLE      | st2a     | ref    | st_trip_id_idx                            | st_trip_id_idx   | 302     | bicou_gtfs_nantes.t2.trip_id     |   14 | Using where |
    |  1 | SIMPLE      | s2a      | eq_ref | PRIMARY                                   | PRIMARY          | 302     | bicou_gtfs_nantes.st2a.stop_id   |    1 | NULL        |
    |  1 | SIMPLE      | cs2      | ref    | from_to_stop_ids                          | from_to_stop_ids | 302     | bicou_gtfs_nantes.st2a.stop_id   |    1 | Using index |
    |  1 | SIMPLE      | s3d      | eq_ref | PRIMARY                                   | PRIMARY          | 302     | bicou_gtfs_nantes.cs2.to_stop_id |    1 | NULL        |
    |  1 | SIMPLE      | st3d     | ref    | st_stop_id_idx                            | st_stop_id_idx   | 302     | bicou_gtfs_nantes.s3d.stop_id    |  163 | Using where |
    |  1 | SIMPLE      | t3       | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY          | 302     | bicou_gtfs_nantes.st3d.trip_id   |    1 | Using where |
    |  1 | SIMPLE      | calendar | eq_ref | PRIMARY,service_id                        | PRIMARY          | 302     | bicou_gtfs_nantes.t3.service_id  |    1 | Using where |
    |  1 | SIMPLE      | st3a     | ref    | st_stop_id_idx,st_trip_id_idx             | st_trip_id_idx   | 302     | bicou_gtfs_nantes.t3.trip_id     |   14 | Using where |
    |  1 | SIMPLE      | s3a      | eq_ref | PRIMARY                                   | PRIMARY          | 302     | bicou_gtfs_nantes.st3a.stop_id   |    1 | NULL        |
    +----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+-------------+
    

    离开ECTE2(0结果,快)

    EXPLAIN EXTENDED…

    显然,引擎处理两个查询的方式不同。现在为什么是一个不同的问题。

    +----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+----------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+----------+---------------------------------+ | 1 | SIMPLE | s1d | const | PRIMARY | PRIMARY | 302 | const | 1 | 100.00 | Using temporary; Using filesort | | 1 | SIMPLE | st1d | ref | st_stop_id_idx | st_stop_id_idx | 302 | const | 234 | 100.00 | Using where | | 1 | SIMPLE | t1 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st1d.trip_id | 1 | 100.00 | Using where | | 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t1.service_id | 1 | 100.00 | Using where | | 1 | SIMPLE | st1a | ref | st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t1.trip_id | 14 | 100.00 | Using where | | 1 | SIMPLE | s1a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st1a.stop_id | 1 | 100.00 | NULL | | 1 | SIMPLE | cs1 | ref | from_to_stop_ids | from_to_stop_ids | 302 | bicou_gtfs_nantes.st1a.stop_id | 1 | 100.00 | Using index | | 1 | SIMPLE | s2d | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.cs1.to_stop_id | 1 | 100.00 | NULL | | 1 | SIMPLE | st2d | ref | st_stop_id_idx | st_stop_id_idx | 302 | bicou_gtfs_nantes.s2d.stop_id | 163 | 100.00 | Using where | | 1 | SIMPLE | t2 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st2d.trip_id | 1 | 100.00 | Using where | | 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t2.service_id | 1 | 100.00 | Using where | | 1 | SIMPLE | st2a | ref | st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t2.trip_id | 14 | 100.00 | Using where | | 1 | SIMPLE | s2a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st2a.stop_id | 1 | 100.00 | NULL | | 1 | SIMPLE | cs2 | ref | from_to_stop_ids | from_to_stop_ids | 302 | bicou_gtfs_nantes.st2a.stop_id | 1 | 100.00 | Using index | | 1 | SIMPLE | s3d | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.cs2.to_stop_id | 1 | 100.00 | NULL | | 1 | SIMPLE | st3d | ref | st_stop_id_idx | st_stop_id_idx | 302 | bicou_gtfs_nantes.s3d.stop_id | 163 | 100.00 | Using where | | 1 | SIMPLE | t3 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st3d.trip_id | 1 | 100.00 | Using where | | 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t3.service_id | 1 | 100.00 | Using where | | 1 | SIMPLE | st3a | ref | st_stop_id_idx,st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t3.trip_id | 14 | 100.00 | Using where | | 1 | SIMPLE | s3a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st3a.stop_id | 1 | 100.00 | NULL | +----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+----------+---------------------------------+ 对象来自表+----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+---------------------------------+ | 1 | SIMPLE | s1d | const | PRIMARY | PRIMARY | 302 | const | 1 | Using temporary; Using filesort | | 1 | SIMPLE | st1d | ref | st_stop_id_idx | st_stop_id_idx | 302 | const | 234 | Using where | | 1 | SIMPLE | t1 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st1d.trip_id | 1 | Using where | | 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t1.service_id | 1 | Using where | | 1 | SIMPLE | st1a | ref | st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t1.trip_id | 14 | Using where | | 1 | SIMPLE | s1a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st1a.stop_id | 1 | NULL | | 1 | SIMPLE | cs1 | ref | from_to_stop_ids | from_to_stop_ids | 302 | bicou_gtfs_nantes.st1a.stop_id | 1 | Using index | | 1 | SIMPLE | s2d | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.cs1.to_stop_id | 1 | NULL | | 1 | SIMPLE | st2d | ref | st_stop_id_idx | st_stop_id_idx | 302 | bicou_gtfs_nantes.s2d.stop_id | 163 | Using where | | 1 | SIMPLE | t2 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st2d.trip_id | 1 | Using where | | 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t2.service_id | 1 | Using where | | 1 | SIMPLE | st2a | ref | st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t2.trip_id | 14 | Using where | | 1 | SIMPLE | s2a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st2a.stop_id | 1 | NULL | | 1 | SIMPLE | cs2 | ref | from_to_stop_ids | from_to_stop_ids | 302 | bicou_gtfs_nantes.st2a.stop_id | 1 | Using index | | 1 | SIMPLE | s3d | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.cs2.to_stop_id | 1 | NULL | | 1 | SIMPLE | st3d | ref | st_stop_id_idx | st_stop_id_idx | 302 | bicou_gtfs_nantes.s3d.stop_id | 163 | Using where | | 1 | SIMPLE | t3 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st3d.trip_id | 1 | Using where | | 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t3.service_id | 1 | Using where | | 1 | SIMPLE | st3a | ref | st_stop_id_idx,st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t3.trip_id | 14 | Using where | | 1 | SIMPLE | s3a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st3a.stop_id | 1 | NULL | +----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+---------------------------------+

    s1d

    我不明白为什么没有数据时引擎正确使用索引和键;当有数据(13行)时,引擎不会使用索引和键,而是浏览3千行而不是1行。

    我有什么方法可以强制引擎在特定的桌子上使用按键吗? 另外,为什么引擎表现得像这样?

    环境:

    • 操作系统:Mac OS X 10.10
    • SQL客户端:mysql Ver 14.14 Distrib 5.6.17,osx10.6(i386)使用EditLine包装器
    • SQL server:5.6.21 MySQL社区服务器(GPL)
    • 硬件:MacBook Air,Intel Core i7,8GB RAM,256GB SSD(应该快)

    表格大小:

    stops

    每个联接表后的行数:

    CREATE TABLE IF NOT EXISTS `stops` (
      `stop_id` VARCHAR(100) NOT NULL,
      `stop_code` VARCHAR(50) NULL DEFAULT NULL,
      `stop_name` VARCHAR(255) NOT NULL,
      `stop_desc` VARCHAR(255) NULL DEFAULT NULL,
      `stop_lat` DECIMAL(10,6) NOT NULL,
      `stop_lon` DECIMAL(10,6) NOT NULL,
      `zone_id` VARCHAR(255) NULL DEFAULT NULL,
      `stop_url` VARCHAR(255) NULL DEFAULT NULL,
      `location_type` VARCHAR(2) NULL DEFAULT NULL,
      `parent_station` VARCHAR(100) NOT NULL,
      `stop_timezone` VARCHAR(50) NULL DEFAULT NULL,
      `wheelchair_boarding` TINYINT(1) NULL DEFAULT NULL,
      PRIMARY KEY (`stop_id`),
      INDEX `zone_id` (`zone_id` ASC),
      INDEX `stop_lat` (`stop_lat` ASC),
      INDEX `stop_lon` (`stop_lon` ASC),
      INDEX `location_type` (`location_type` ASC),
      INDEX `parent_station` (`parent_station` ASC),
      CONSTRAINT `stop_parent_station`
        FOREIGN KEY (`parent_station`)
        REFERENCES `stops` (`stop_id`)
        ON DELETE NO ACTION
        ON UPDATE NO ACTION)
    ENGINE = InnoDB
    DEFAULT CHARACTER SET = utf8
    

1 个答案:

答案 0 :(得分:1)

记住两个想法:

1)将一些索引切换为BTEE索引。默认值为HASH,适用于相等/不等的比较,而不是IN(...)。见here

2)查看优化器对您的查询所做的事情。做一个

EXPLAIN EXTENDED SELECT ...
两个查询都

。这将为您提供包含查询优化器输出的警告。你应该在这里看到一点不同。