Question

我有一个有问题的查询，我知道如何更快地编写，但从技术上讲，SQL无效，并且无法保证将来正常工作。

原始的慢查询如下所示：

SELECT sql_no_cache DISTINCT r.field_1 value
FROM table_middle m
JOIN table_right r on r.id = m.id
WHERE ((r.field_1) IS NOT NULL) 
AND (m.kind IN ('partial')) 
ORDER BY r.field_1 
LIMIT 26

这需要 37秒。解释输出：

+----+-------------+-------+--------+-----------------------+---------------+---------+---------+-----------------------------------------------------------+
| id | select_type | table | type   | possible_keys         | key           | key_len | rows    | Extra                                                     |
+----+-------------+-------+--------+-----------------------+---------------+---------+---------+-----------------------------------------------------------+
|  1 | SIMPLE      | r     | range  | PRIMARY,index_field_1 | index_field_1 | 9       | 1544595 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | m     | eq_ref | PRIMARY,index_kind    | PRIMARY       | 4       |       1 | Using where; Distinct                                     |
+----+-------------+-------+--------+-----------------------+---------------+---------+---------+-----------------------------------------------------------+

更快的版本看起来像这样; order by子句被下推到一个子查询中，该子查询被连接起来，并且又被限制为不同的：

SELECT sql_no_cache DISTINCT value 
FROM (
  SELECT r.field_1 value
  FROM table_middle m
  JOIN table_right r ON r.id = m.id
  WHERE ((r.field_1) IS NOT NULL) 
  AND (m.kind IN ('partial')) 
  ORDER BY r.field_1 
) t
LIMIT 26

这需要 2.7秒。解释输出：

+----+-------------+------------+--------+-----------------------+------------+---------+---------+-----------------------------------------------------------+
| id | select_type | table      | type   | possible_keys         | key        | key_len | rows    | Extra                                                     |
+----+-------------+------------+--------+-----------------------+------------+---------+---------+-----------------------------------------------------------+
|  1 | PRIMARY     | <derived2> | ALL    | NULL                  | NULL       | NULL    | 1346348 | Using temporary                                           |
|  2 | DERIVED     | m          | ref    | PRIMARY,index_kind    | index_kind | 99      | 1539558 | Using where; Using index; Using temporary; Using filesort |
|  2 | DERIVED     | r          | eq_ref | PRIMARY,index_field_1 | PRIMARY    | 4       |       1 | Using where                                               |
+----+-------------+------------+--------+-----------------------+------------+---------+---------+-----------------------------------------------------------+

table_right和table_middle中有三百万行，所有提到的列都是单独索引的。查询应该被理解为具有任意where子句 - 它是动态生成的。查询无法以任何方式重写，以防止where子句被轻易替换，类似地，索引也无法更改 - MySQL不支持足够的索引来处理潜在的过滤器字段组合的数量

之前有没有人见过这个问题 - 具体来说，选择/ distinct / order by / limit执行得非常糟糕 - 还有另一种方法来编写具有良好性能但不依赖于未指定实现行为的相同查询吗？ / p>

（例如，AFAIK MariaDB忽略了子查询中的order by因为它不应该在逻辑上影响查询的集合理论语义。）

对于更不相信的

以下是如何为自己创建数据库版本的方法！这应该输出一个可以用mysql命令行客户端运行的SQL脚本：

#!/usr/bin/env ruby
puts "create database testy;"
puts "use testy;"
puts "create table table_right(id int(11) not null primary key, field_0 int(11), field_1 int(11), field_2 int(11));"
puts "create table table_middle(id int(11) not null primary key, field_0 int(11), field_1 int(11), field_2 int(11));"
puts "begin;"
3_000_000.times do |x|
  puts "insert into table_middle values (#{x},#{x*10},#{x*100},#{x*1000});"
  puts "insert into table_right values (#{x},#{x*10},#{x*100},#{x*1000});"
end
puts "commit;"

索引对于再现效果并不重要。上面的脚本未经测试;它是我手动复制问题时的撬开会话的近似值。

将m.kind in ('partial')替换为m.field_1 > 0或类似的其他类似的东西。观察两种不同技术之间在性能上的巨大差异，以及如何保留排序语义（使用MySQL 5.5进行测试）。当然，语义的不可靠性正是我提出这个问题的原因。

Answer 1

请提供SHOW CREATE TABLE。如果没有，我会猜测这些都缺失了并且可能有用：

m:  (kind, id)
r:  (field_1, id)

您可以关闭MariaDB忽略子查询的ORDER BY。

优化MySQL通过安全限制选择不同的顺序

对于更不相信的

1 个答案: