我有一个相当复杂的查询,我真的想使用LEFT JOIN构造而不带任何UNION语句,但是它运行太慢。即使我简化它来隔离问题,我也不明白为什么一个查询应该运行得这么快。
我正在使用MySQL版本:5.6.36-82.1-log
有什么方法可以在不使用UNION的情况下优化此查询吗?
select SQL_NO_CACHE distinct `locations`.* from `locations`
left join `location_address` on `location_address`.`location_id` = `locations`.`id`
left join `addresses` on `location_address`.`address_id` = `addresses`.`id`
left join `cities` on `addresses`.`city_id` = `cities`.`id`
where `cities`.`name` = 'New York'
or `locations`.`description` like '%New York%'
运行时间:13.422秒
当我拆分并使用UNION时,它要快得多:
(select SQL_NO_CACHE distinct `locations`.* from `locations`
left join `location_address` on `location_address`.`location_id` = `locations`.`id`
left join `addresses` on `location_address`.`address_id` = `addresses`.`id`
left join `cities` on `addresses`.`city_id` = `cities`.`id`
where `cities`.`name` = 'New York')
union
(select distinct `locations`.* from `locations`
left join `location_address` on `location_address`.`location_id` = `locations`.`id`
left join `addresses` on `location_address`.`address_id` = `addresses`.`id`
left join `cities` on `addresses`.`city_id` = `cities`.`id`
where `locations`.`description` like '%New York%')
运行时间:0.219秒
如果我将“左连接”更改为(内部)“连接”,则速度会更快(但会省略无地址的位置):
select SQL_NO_CACHE distinct `locations`.* from `locations`
join `location_address` on `location_address`.`location_id` = `locations`.`id`
join `addresses` on `location_address`.`address_id` = `addresses`.`id`
join `cities` on `addresses`.`city_id` = `cities`.`id`
where `cities`.`name` = 'New York'
or `locations`.`description` like '%New York%'
运行时间:0.219秒
此外,将cities
。name
条件添加到LEFT JOIN也无济于事:
select SQL_NO_CACHE distinct `locations`.* from `locations`
left join `location_address` on `location_address`.`location_id` = `locations`.`id`
left join `addresses` on `location_address`.`address_id` = `addresses`.`id`
left join `cities` on `addresses`.`city_id` = `cities`.`id` AND `cities`.`name` = 'New York'
where `cities`.`name` = 'New York'
or `locations`.`description` like '%New York%'
运行时间:13.812秒
每个表中的条目是:
每个表上的id
字段是主索引,而cities
。name
也是索引。 locations
。index
是一个长文本字段。
以下是一些示例结构和数据:
位置
+----+----------------------+
| id | description |
+----+---------------------+
| 1 | Somewhere out there |
+----+----------------------+
| 2 | In New York |
+----+----------------------+
| 3 | Elsewhere |
+----+----------------------+
位置地址
+----+-------------+------------+
| id | location_id | address_id |
+----+-------------+------------+
| 1 | 1 | 1 |
+----+-------------+------------+
| 2 | 1 | 2 |
+----+-------------+------------+
| 3 | 3 | 3 |
+----+-------------+------------+
地址
+----+---------+
| id | city_id |
+----+---------+
| 1 | 1 |
+----+---------+
| 2 | 2 |
+----+---------+
| 3 | 2 |
+----+---------+
城市
+----+-----------+
| id | name |
+----+-----------+
| 1 | New York |
+----+-----------+
| 2 | Chicago |
+----+-----------+
| 3 | Houston |
+----+-----------+
我真的想避免使用UNION,因为我有很多条件过滤器,有时我不得不省略部分联合,因为我只想使用带有地址的位置。使用UNION也会大大增加我的查询构建代码的复杂性。我也想避免子查询。
答案 0 :(得分:1)
您可以这样编写查询:
select *
from
(
Select <sql statement a>
UNION
Select <sql statement a>
) x
where x. <extra where clauses here>
您可能会在两个联合的内部选择中加入最少的限制条款,然后对结果添加额外的限制。我认为,这将提供最大的灵活性。
答案 1 :(得分:0)
如果查看执行计划,您会发现它们是不同的。问题可能是索引可以更优化地用于两个子查询。但是,众所周知,数据库优化器在优化or
方面很差。
顺便说一句,这个版本的性能如何?
select SQL_NO_CACHE l.*
from locations l
where exists (select 1
from location_address la join
addresses a
on la.address_id = a.id join
cities c
on a.city_id = c.id
where la.location_id = l.id and c.name = 'New York'
) or
l.description like '%New York%';
您应该能够优化此子查询,以便其快速运行。另外,删除重复项不会产生开销。
为了提高性能,可以在location_address(location_id)
,addresses(id, city_id)
和city(id, name)
上使用索引。
答案 2 :(得分:-1)
我设法通过在数据透视表中添加索引来解决该问题:
ALTER TABLE `location_address` ADD INDEX `location_id_index` (`location_id` ASC);
运行时间:0.188秒
这比使用UNION方法要快。