Question

我有以下表格（删除了无关的内容）：

create table Payment (
    id int not null auto_increment,
    status int not null,
    primary key(id)
);
create table Booking (
    id int not null auto_increment,
    paymentId int not null,
    nrOfPassengers int not null,
    primary key(id),
    key paymentFK (paymentId),
    constraint paymentFK foreign key (paymentId) references Payment(id)
);

Booking包含~456k行，Payment包含~331k行。以下查询需要0.06秒并返回97行：

select * from Booking b
join Payment p on b.paymentId = p.id
where p.status = 3

如果我添加order by子句，则查询需要4.4s，几乎慢100倍：

select * from Booking b
join Payment p on b.paymentId = p.id
where p.status = 3
order by b.nrOfPassengers

第一个查询的EXPLAIN：

id, select_type, table, type, possible_keys, key,       key_len, ref,  rows,   Extra
1,  SIMPLE,      p,     ALL,  PRIMARY,       NULL,      NULL,    NULL, 331299, Using where
1,  SIMPLE,      b,     ref,  paymentFK,     paymentFK, 9,       p.id, 1,      Using where

和第二个：

id, select_type, table, type, possible_keys, key,       key_len, ref,  rows,   Extra
1,  SIMPLE,      p,     ALL,  PRIMARY,       NULL,      NULL,    NULL, 331299, Using where; Using temporary; Using filesort
1,  SIMPLE,      b,     ref,  paymentFK,     paymentFK, 9,       p.id, 1,      Using where

我使用MySQL 5.1.34。

查询中使用的where子句会过滤掉Payment中的绝大多数行。我得到的结论是MySQL在使用（高度选择性）where子句对其进行过滤之前对结果集进行排序。我是对的吗？如果是这样，为什么会这样做？我试过分析这两个表，但查询计划没有变化。

Answer 1

首先，确保您的表上有适当的索引。假设您这样做并且它仍然比预期慢，您可以将结果抛出到子查询中而不对它们进行排序，然后再添加ORDER BY子句：

SELECT * 
FROM (
   select * from Booking b
   join Payment p on b.paymentId = p.id
   where p.status = 3
)
ORDER BY nrOfPassengers

我不确定这有多大（或者是否）这会有所帮助，因为当我查看执行计划时它会添加一行，但它可能会更快。

祝你好运。

Answer 2

我有一个怀疑，问题是你删除的不相关的东西中有一个TEXT或BLOB列，它使得mysql在ondisk上存储来自临时表的中间结果。

无论如何，我们从执行计划中看到的是：对于Payment表中的每一行，从磁盘获取它，检查条件，如果Booking中的每个匹配行的结果为true，则将结果放入临时表中。使用nrOfPassengers的所有数据对整个表进行排序并输出。如果有Text或Blob个字段，则中间表在磁盘上存储和排序，因为MySQL无法预测表的大小。

您可以做的（像往常一样）是最小化磁盘操作。正如@ajreal建议的那样，在status列上添加一个索引。如果它是如此具有选择性，您将不再需要任何其他索引，但如果您将paymentFK扩展到(paymentId, nrOfPassengers)，它将会更好。现在按如下方式重写查询：

SELECT p.*, b.*
FROM (
  select p.id as paymentId, b.id as bookingId
  from Booking b
  join Payment p on b.paymentId = p.id
  where p.status = 3
  order by b.nrOfPassengers
) as ids
JOIN Payment p ON ids.paymentId = p.id
JOIN Booking b ON ids.bookingId = b.id;

数据将以子查询顺序输出。

尽管结果集很小，ORDER BY子句使查询变慢

2 个答案: