Question

我有一张下表

CREATE TABLE `test_series_analysis_data` (
  `email` varchar(255) NOT NULL,
  `mappingId` int(11) NOT NULL,
  `packageId` varchar(255) NOT NULL,
  `sectionName` varchar(255) NOT NULL,
  `createdAt` datetime(3) DEFAULT NULL,
  `marksObtained` float NOT NULL,
  `updatedAt` datetime DEFAULT NULL,
  `testMetaData` longtext,
  PRIMARY KEY (`email`,`mappingId`,`packageId`,`sectionName`),
  KEY `rank_index` (`mappingId`,`packageId`,`sectionName`,`marksObtained`),
  KEY `mapping_package` (`mappingId`,`packageId`)
  ) ENGINE=InnoDB DEFAULT CHARSET=utf8 |

以下是查询解释的输出：

explain select rank 
from (
   select email, @i:=@i+1 as rank 
   from test_series_analysis_data ta 
   join (select @i:=0) va 
   where mappingId = ?1 
   and packageId = ?2 
   and sectionName = ?3 
   order by marksObtained desc
) as inter 
where inter.email = ?4;

+----+-------------+------------+------------+--------+----------------------------+-------------+---------+-------+-------+----------+--------------------------+
| id | select_type | table      | partitions | type   | possible_keys              | key         | key_len | ref   | rows  | filtered | Extra                    |
+----+-------------+------------+------------+--------+----------------------------+-------------+---------+-------+-------+----------+--------------------------+
|  1 | PRIMARY     | <derived2> | NULL       | ref    | <auto_key0>                | <auto_key0> | 767     | const |    10 |   100.00 | NULL                     |
|  2 | DERIVED     | <derived3> | NULL       | system | NULL                       | NULL        | NULL    | NULL  |     1 |   100.00 | Using filesort           |
|  2 | DERIVED     | ta         | NULL       | ref    | rank_index,mapping_package | rank_index  | 4       | const | 20160 |     1.00 | Using where; Using index |
|  3 | DERIVED     | NULL       | NULL       | NULL   | NULL                       | NULL        | NULL    | NULL  |  NULL |     NULL | No tables used           |
+----+-------------+------------+------------+--------+----------------------------+-------------+---------+-------+-------+----------+--------------------------+

查询优化器可能已经使用了两个索引，但是rank_index是覆盖索引，因此它被选中。令我惊讶的是以下查询的输出：

explain select rank 
from ( 
  select email, @i:=@i+1 as rank 
  from test_series_analysis_data ta use index (mapping_package) 
  join (select @i:=0) va 
  where mappingId = ?1 
  and packageId = ?2 
  and sectionName = ?3 
  order by marksObtained desc
) as inter 
where inter.email = ?4;

+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+-------+----------+-----------------------+
| id | select_type | table      | partitions | type   | possible_keys   | key             | key_len | ref   | rows  | filtered | Extra                 |
+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+-------+----------+-----------------------+
|  1 | PRIMARY     | <derived2> | NULL       | ref    | <auto_key0>     | <auto_key0>     | 767     | const |    10 |   100.00 | NULL                  |
|  2 | DERIVED     | <derived3> | NULL       | system | NULL            | NULL            | NULL    | NULL  |     1 |   100.00 | Using filesort        |
|  2 | DERIVED     | ta         | NULL       | ref    | mapping_package | mapping_package | 4       | const | 19434 |     1.00 | Using index condition |
|  3 | DERIVED     | NULL       | NULL       | NULL   | NULL            | NULL            | NULL    | NULL  |  NULL |     NULL | No tables used        |
+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+-------+----------+-----------------------+

当使用的索引是mapping_package时，为什么rows较小（19434 <20160）。 rank_index可以更好地选择所需的内容，因此rank_index中的行数应该更少。

这是否意味着对于给定的查询，mapping_package索引优于rank_index？

sectionName是varchar有什么影响，所以这两个索引应该提供类似的性能吗？

此外，我假设Using index condition仅从索引中选择几行并扫描更多行。在Using where; Using index的情况下，优化器必须只读取索引而不是表来获取行，然后选择一些数据。那么为什么在使用rank_index时会遗漏Using where？

此外，当索引中只有两列时，为什么mapping_package的key_len为4？

帮助表示赞赏。

Answer 1

(19434<20160) - 这两个数字都是估算值。他们接近这一点是不寻常的。我敢打赌，如果你做ANALYZE TABLE，两者都会改变，可能会改变不平等。

请注意其他内容：Using where; Using index与Using index condition。

但首先，让我提醒您，在InnoDB中，PRIMARY KEY列被添加到辅助键上。所以，实际上你有

KEY `rank_index`      (`mappingId`,`packageId`,`sectionName`,`marksObtained`,`email`)
KEY `mapping_package` (`mappingId`,`packageId`,`email`,`sectionName`)

现在让我们决定最佳索引应该是什么：其中mappingId =？1 和packageId =？2 和sectionName =？3 按markObtained desc命令

首先，=的{{1}}部分：WHERE，mappingId，packageId，任何顺序;
然后sectionName列：ORDER BY
奖励：最后，如果marksObtained（email中任何地方提到的唯一其他列）都在密钥中，那么它将是“覆盖”。

这表示SELECT是“完美的”，而另一个指数并不是那么好。唉，rank_index没有明确说出来。

你也可以想出这一点 - 你所需要的只是研究我的博客：http://mysql.rjweb.org/doc.php/index_cookbook_mysql（对不起;现在已经很晚了，而且我变得厚脸皮了。）

其他提示：

不要盲目使用EXPLAIN。当需要tmp表时，这会使tmp表更大，因此效率更低。将限制降低到合理的水平。或...
如果这是一张巨大的表格，你真的应该对字符串进行“规范化”，用一个2字节(255)替换它们。这将以其他方式提高性能，例如降低代价高昂的I / O. （好的，20行非常小，所以这可能不适用。）

为什么SMALLINT UNSIGNED 4？这意味着使用了一列，即4字节key_len INT。我本来期望它也使用第二列。所以，我很难过。 mappingId可能会提供更多线索。

MySQL没有从索引中选择正确的行数

1 个答案: