Question

我正在尝试优化以下查询：

select distinct this_.id as y0_
from Rental this_
    left outer join RentalRequest rentalrequ1_ 
      on this_.id=rentalrequ1_.rental_id
    left outer join RentalSegment rentalsegm2_ 
      on rentalrequ1_.id=rentalsegm2_.rentalRequest_id
where
    this_.DTYPE='B'
    and this_.id<=1848978
    and this_.billingStatus=1
    and rentalsegm2_.endDate between 1273631699529 and 1274927699529
order by rentalsegm2_.id asc
limit 0, 100;

此查询连续多次完成，用于记录的分页处理（每次具有不同的限制）。它返回我在处理中需要的ID。我的问题是这个查询需要超过3秒。我在这三个表中每个都有大约200万行。

解释给出：

+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+----------------------------------------------+
| id | select_type | table        | type   | possible_keys                                       | key           | key_len | ref                                        | rows   | Extra                                        |
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | rentalsegm2_ | range  | index_endDate,fk_rentalRequest_id_BikeRentalSegment | index_endDate | 9       | NULL                                       | 449904 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | rentalrequ1_ | eq_ref | PRIMARY,fk_rental_id_BikeRentalRequest              | PRIMARY       | 8       | solscsm_main.rentalsegm2_.rentalRequest_id |      1 | Using where                                  | 
|  1 | SIMPLE      | this_        | eq_ref | PRIMARY,index_billingStatus                         | PRIMARY       | 8       | solscsm_main.rentalrequ1_.rental_id        |      1 | Using where                                  | 
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+----------------------------------------------+

我尝试删除distinct，并且查询运行速度提高了三倍。解释没有查询给出：

+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+-----------------------------+
| id | select_type | table        | type   | possible_keys                                       | key           | key_len | ref                                        | rows   | Extra                       |
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+-----------------------------+
|  1 | SIMPLE      | rentalsegm2_ | range  | index_endDate,fk_rentalRequest_id_BikeRentalSegment | index_endDate | 9       | NULL                                       | 451972 | Using where; Using filesort | 
|  1 | SIMPLE      | rentalrequ1_ | eq_ref | PRIMARY,fk_rental_id_BikeRentalRequest              | PRIMARY       | 8       | solscsm_main.rentalsegm2_.rentalRequest_id |      1 | Using where                 | 
|  1 | SIMPLE      | this_        | eq_ref | PRIMARY,index_billingStatus                         | PRIMARY       | 8       | solscsm_main.rentalrequ1_.rental_id        |      1 | Using where                 | 
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+-----------------------------+

如您所见，使用distinct时会添加Using temporary。

我已经在where子句中使用了所有字段的索引。我可以做些什么来优化这个查询？

非常感谢！

编辑：我尝试按照建议在this_.id上订购，查询速度慢了5倍。这是解释计划：

+----+-------------+--------------+------+-----------------------------------------------------+---------------------------------------+---------+------------------------------+--------+----------------------------------------------+
| id | select_type | table        | type | possible_keys                                       | key                                   | key_len | ref                          | rows   | Extra                                        |
+----+-------------+--------------+------+-----------------------------------------------------+---------------------------------------+---------+------------------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | this_        | ref  | PRIMARY,index_billingStatus                         | index_billingStatus                   | 5       | const                        | 782348 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | rentalrequ1_ | ref  | PRIMARY,fk_rental_id_BikeRentalRequest              | fk_rental_id_BikeRentalRequest        | 9       | solscsm_main.this_.id        |      1 | Using where; Using index; Distinct           | 
|  1 | SIMPLE      | rentalsegm2_ | ref  | index_endDate,fk_rentalRequest_id_BikeRentalSegment | fk_rentalRequest_id_BikeRentalSegment | 8       | solscsm_main.rentalrequ1_.id |      1 | Using where; Distinct                        | 
+----+-------------+--------------+------+-----------------------------------------------------+---------------------------------------+---------+------------------------------+--------+----------------------------------------------+

Answer 1

没有distinct的查询运行得更快的原因是因为你有一个limit子句。没有区别，服务器只需要查看前100个匹配项。但是，其中一些行可能有重复的字段，因此如果引入distinct子句，服务器必须查看更多行才能找到没有重复值的行。

BTW，你为什么要使用OUTER JOIN？

Answer 2

从执行计划中我们看到优化器足够聪明，可以理解这里不需要OUTER JOIN。无论如何，你应该更明确地指明它。
DISTINCT修饰符表示您想要SELECT部分中的所有字段，即所有指定字段的ORDER BY，然后丢弃重复项。换句话说，order by rentalsegm2_.id asc子句在这里没有任何意义。

以下查询应返回等效结果：

select distinct this_.id as y0_
from Rental this_
    join RentalRequest rentalrequ1_ 
      on this_.id=rentalrequ1_.rental_id
    join RentalSegment rentalsegm2_ 
      on rentalrequ1_.id=rentalsegm2_.rentalRequest_id
where
    this_.DTYPE='B'
    and this_.id<=1848978
    and this_.billingStatus=1
    and rentalsegm2_.endDate between 1273631699529 and 1274927699529
limit 0, 100;

<强> UPD

如果您希望执行计划以RentalSegment开头，则需要将以下索引添加到数据库中：

RentalSegment（endDate）
RentalRequest（id，rental_id）
租借（id，DTYPE，billingStatus）或（id，billingStatus，DTYPE）

然后可以将查询重写为以下内容：

SELECT this_.id as y0_
FROM RentalSegment rs
    JOIN RentalRequest rr
    JOIN Rental this_
WHERE rs.endDate between 1273631699529 and 1274927699529
    AND rs.rentalRequest_id = rr.id
    AND rr.rental_id <= 1848978
    AND rr.rental_id = this_.id
    AND this_.DTYPE='D'
    AND this_.billingStatus = 1
GROUP BY this_.id
LIMIT 0, 100;

如果执行计划不是从RentalSegment开始，您可以强制使用STRAIGHT_JOIN。

Answer 3

这里对于“rentalsegm2_”表，优化器选择了“index_endDate”索引，并且该表中预期的行数约为4.5万亿。由于存在其他条件，您可以检查“this_”表索引。我的意思是你可以在“this_ table”中查看每个条件受影响的记录数量。

总之，您可以通过更改优化程序使用的索引来尝试替代解决方案。这可以通过“USE INDEX”，“FORCE INDEX”命令获得。

由于

Rinson KE DBA www.qburst.com

MySQL查询优化 - 不同，顺序和限制

3 个答案: