为什么UNION比使用OR的LEFT JOIN快得多?

时间:2019-03-01 16:58:11

标签: mysql sql

我有一个相当复杂的查询,我真的想使用LEFT JOIN构造而不带任何UNION语句,但是它运行太慢。即使我简化它来隔离问题,我也不明白为什么一个查询应该运行得这么快。

我正在使用MySQL版本:5.6.36-82.1-log

有什么方法可以在不使用UNION的情况下优化此查询吗?

select SQL_NO_CACHE distinct `locations`.* from `locations` 
left join `location_address` on `location_address`.`location_id` = `locations`.`id` 
left join `addresses` on `location_address`.`address_id` = `addresses`.`id` 
left join `cities` on `addresses`.`city_id` = `cities`.`id`
where `cities`.`name` = 'New York'
or `locations`.`description` like '%New York%'

运行时间:13.422秒

当我拆分并使用UNION时,它要快得多:

(select SQL_NO_CACHE distinct `locations`.* from `locations` 
left join `location_address` on `location_address`.`location_id` = `locations`.`id` 
left join `addresses` on `location_address`.`address_id` = `addresses`.`id` 
left join `cities` on `addresses`.`city_id` = `cities`.`id` 
where `cities`.`name` = 'New York')
union
(select distinct `locations`.* from `locations` 
left join `location_address` on `location_address`.`location_id` = `locations`.`id` 
left join `addresses` on `location_address`.`address_id` = `addresses`.`id` 
left join `cities` on `addresses`.`city_id` = `cities`.`id` 
where `locations`.`description` like '%New York%')

运行时间:0.219秒

如果我将“左连接”更改为(内部)“连接”,则速度会更快(但会省略无地址的位置):

select SQL_NO_CACHE distinct `locations`.* from `locations` 
join `location_address` on `location_address`.`location_id` = `locations`.`id` 
join `addresses` on `location_address`.`address_id` = `addresses`.`id` 
join `cities` on `addresses`.`city_id` = `cities`.`id`
where `cities`.`name` = 'New York'
or `locations`.`description` like '%New York%'

运行时间:0.219秒

此外,将citiesname条件添加到LEFT JOIN也无济于事:

select SQL_NO_CACHE distinct `locations`.* from `locations` 
left join `location_address` on `location_address`.`location_id` = `locations`.`id` 
left join `addresses` on `location_address`.`address_id` = `addresses`.`id` 
left join `cities` on `addresses`.`city_id` = `cities`.`id` AND `cities`.`name` = 'New York'
where `cities`.`name` = 'New York'
or `locations`.`description` like '%New York%'

运行时间:13.812秒

每个表中的条目是:

  • 位置:〜5000行
  • location_address:〜4900行(〜100个位置具有2个条目,〜200个位置具有0个)
  • 地址:〜5500行(〜600个地址是从其他表链接的)
  • 引用:约30,000行(使用美国的完整城市数据库)

每个表上的id字段是主索引,而citiesname也是索引。 locationsindex是一个长文本字段。

以下是一些示例结构和数据:

位置

+----+----------------------+
| id | description          |
+----+---------------------+
| 1  | Somewhere out there  |
+----+----------------------+
| 2  | In New York          |
+----+----------------------+
| 3  | Elsewhere            |
+----+----------------------+

位置地址

+----+-------------+------------+
| id | location_id | address_id |
+----+-------------+------------+
| 1  | 1           | 1          |
+----+-------------+------------+
| 2  | 1           | 2          |
+----+-------------+------------+
| 3  | 3           | 3          |
+----+-------------+------------+

地址

+----+---------+
| id | city_id |
+----+---------+
| 1  | 1       |
+----+---------+
| 2  | 2       |
+----+---------+
| 3  | 2       |
+----+---------+

城市

+----+-----------+
| id | name      |
+----+-----------+
| 1  | New York  |
+----+-----------+
| 2  | Chicago   |
+----+-----------+
| 3  | Houston   |
+----+-----------+

我真的想避免使用UNION,因为我有很多条件过滤器,有时我不得不省略部分联合,因为我只想使用带有地址的位置。使用UNION也会大大增加我的查询构建代码的复杂性。我也想避免子查询。

3 个答案:

答案 0 :(得分:1)

您可以这样编写查询:

select *
from
(
    Select <sql statement a>
    UNION
    Select <sql statement a>
) x
where x. <extra where clauses here>

您可能会在两个联合的内部选择中加入最少的限制条款,然后对结果添加额外的限制。我认为,这将提供最大的灵活性。

答案 1 :(得分:0)

如果查看执行计划,您会发现它们是不同的。问题可能是索引可以更优化地用于两个子查询。但是,众所周知,数据库优化器在优化or方面很差。

顺便说一句,这个版本的性能如何?

select SQL_NO_CACHE l.*
from locations l
where exists (select 1
              from location_address la join
                   addresses a
                   on la.address_id = a.id join
                   cities c
                   on a.city_id = c.id
              where la.location_id = l.id and c.name = 'New York'
             ) or
     l.description like '%New York%';

您应该能够优化此子查询,以便其快速运行。另外,删除重复项不会产生开销。

为了提高性能,可以在location_address(location_id)addresses(id, city_id)city(id, name)上使用索引。

答案 2 :(得分:-1)

我设法通过在数据透视表中添加索引来解决该问题:

ALTER TABLE `location_address` ADD INDEX `location_id_index` (`location_id` ASC);

运行时间:0.188秒

这比使用UNION方法要快。