Question

我有一个相当复杂的查询，我真的想使用LEFT JOIN构造而不带任何UNION语句，但是它运行太慢。即使我简化它来隔离问题，我也不明白为什么一个查询应该运行得这么快。

我正在使用MySQL版本：5.6.36-82.1-log

有什么方法可以在不使用UNION的情况下优化此查询吗？

select SQL_NO_CACHE distinct `locations`.* from `locations` 
left join `location_address` on `location_address`.`location_id` = `locations`.`id` 
left join `addresses` on `location_address`.`address_id` = `addresses`.`id` 
left join `cities` on `addresses`.`city_id` = `cities`.`id`
where `cities`.`name` = 'New York'
or `locations`.`description` like '%New York%'

运行时间：13.422秒

当我拆分并使用UNION时，它要快得多：

(select SQL_NO_CACHE distinct `locations`.* from `locations` 
left join `location_address` on `location_address`.`location_id` = `locations`.`id` 
left join `addresses` on `location_address`.`address_id` = `addresses`.`id` 
left join `cities` on `addresses`.`city_id` = `cities`.`id` 
where `cities`.`name` = 'New York')
union
(select distinct `locations`.* from `locations` 
left join `location_address` on `location_address`.`location_id` = `locations`.`id` 
left join `addresses` on `location_address`.`address_id` = `addresses`.`id` 
left join `cities` on `addresses`.`city_id` = `cities`.`id` 
where `locations`.`description` like '%New York%')

运行时间：0.219秒

如果我将“左连接”更改为（内部）“连接”，则速度会更快（但会省略无地址的位置）：

select SQL_NO_CACHE distinct `locations`.* from `locations` 
join `location_address` on `location_address`.`location_id` = `locations`.`id` 
join `addresses` on `location_address`.`address_id` = `addresses`.`id` 
join `cities` on `addresses`.`city_id` = `cities`.`id`
where `cities`.`name` = 'New York'
or `locations`.`description` like '%New York%'

运行时间：0.219秒

此外，将cities。name条件添加到LEFT JOIN也无济于事：

select SQL_NO_CACHE distinct `locations`.* from `locations` 
left join `location_address` on `location_address`.`location_id` = `locations`.`id` 
left join `addresses` on `location_address`.`address_id` = `addresses`.`id` 
left join `cities` on `addresses`.`city_id` = `cities`.`id` AND `cities`.`name` = 'New York'
where `cities`.`name` = 'New York'
or `locations`.`description` like '%New York%'

运行时间：13.812秒

每个表中的条目是：

位置：〜5000行
location_address：〜4900行（〜100个位置具有2个条目，〜200个位置具有0个）
地址：〜5500行（〜600个地址是从其他表链接的）
引用：约30,000行（使用美国的完整城市数据库）

每个表上的id字段是主索引，而cities。name也是索引。 locations。index是一个长文本字段。

以下是一些示例结构和数据：

位置

+----+----------------------+
| id | description          |
+----+---------------------+
| 1  | Somewhere out there  |
+----+----------------------+
| 2  | In New York          |
+----+----------------------+
| 3  | Elsewhere            |
+----+----------------------+

位置地址

+----+-------------+------------+
| id | location_id | address_id |
+----+-------------+------------+
| 1  | 1           | 1          |
+----+-------------+------------+
| 2  | 1           | 2          |
+----+-------------+------------+
| 3  | 3           | 3          |
+----+-------------+------------+

地址

+----+---------+
| id | city_id |
+----+---------+
| 1  | 1       |
+----+---------+
| 2  | 2       |
+----+---------+
| 3  | 2       |
+----+---------+

城市

+----+-----------+
| id | name      |
+----+-----------+
| 1  | New York  |
+----+-----------+
| 2  | Chicago   |
+----+-----------+
| 3  | Houston   |
+----+-----------+

我真的想避免使用UNION，因为我有很多条件过滤器，有时我不得不省略部分联合，因为我只想使用带有地址的位置。使用UNION也会大大增加我的查询构建代码的复杂性。我也想避免子查询。

Answer 1

您可以这样编写查询：

select *
from
(
    Select <sql statement a>
    UNION
    Select <sql statement a>
) x
where x. <extra where clauses here>

您可能会在两个联合的内部选择中加入最少的限制条款，然后对结果添加额外的限制。我认为，这将提供最大的灵活性。

Answer 2

如果查看执行计划，您会发现它们是不同的。问题可能是索引可以更优化地用于两个子查询。但是，众所周知，数据库优化器在优化or方面很差。

顺便说一句，这个版本的性能如何？

select SQL_NO_CACHE l.*
from locations l
where exists (select 1
              from location_address la join
                   addresses a
                   on la.address_id = a.id join
                   cities c
                   on a.city_id = c.id
              where la.location_id = l.id and c.name = 'New York'
             ) or
     l.description like '%New York%';

您应该能够优化此子查询，以便其快速运行。另外，删除重复项不会产生开销。

为了提高性能，可以在location_address(location_id)，addresses(id, city_id)和city(id, name)上使用索引。

Answer 3

我设法通过在数据透视表中添加索引来解决该问题：

ALTER TABLE `location_address` ADD INDEX `location_id_index` (`location_id` ASC);

运行时间：0.188秒

这比使用UNION方法要快。

为什么UNION比使用OR的LEFT JOIN快得多？

3 个答案: