Question

我有一个约有500,000行的表，并且正在为此测试两个复合索引。第一个索引将ORDER BY列放在最后，第二个索引则相反。

我不明白的是，为什么第二个索引通过估计要扫描的30行而不是第一个查询的889来提供更好的性能，因为我印象中第二个索引不能正确地用作第二个索引。 ORDER BY列不是最后一个。谁能解释为什么会这样？如果两个索引都存在，MySQL会优先选择第一个索引。

请注意，第二个EXPLAIN列出了可能的键为NULL，但仍列出了选定的键。

1）第一个索引

ALTER TABLE user ADD INDEX test1_idx (city_id, quality);

（基数12942）

EXPLAIN SELECT * FROM user u WHERE u.city_id = 3205 ORDER BY u.quality DESC LIMIT 30;
+----+-------------+-------+--------+---------------+-----------+---------+----------------+------+-------------+
| id | select_type | table | type   | possible_keys | key       | key_len | ref            | rows | Extra       |
+----+-------------+-------+--------+---------------+-----------+---------+----------------+------+-------------+
|  1 | SIMPLE      | u     | ref    | test1_idx     | test1_idx | 3       | const          |  889 | Using where | 
+----+-------------+-------+--------+---------------+-----------+---------+----------------+------+-------------+

2）第二个索引（相同的字段以相反的顺序显示）

ALTER TABLE user ADD INDEX test2_idx (quality, city_id);

（基数7549）

EXPLAIN SELECT * FROM user u WHERE u.city_id = 3205 ORDER BY u.quality DESC LIMIT 30;
+----+-------------+-------+--------+---------------+-----------+---------+----------------+------+-------------+
| id | select_type | table | type   | possible_keys | key       | key_len | ref            | rows | Extra       |
+----+-------------+-------+--------+---------------+-----------+---------+----------------+------+-------------+
|  1 | SIMPLE      | u     | index  | NULL          | test2_idx | 5       | NULL           |  30  | Using where | 
+----+-------------+-------+--------+---------------+-----------+---------+----------------+------+-------------+

更新：

在现实生活中，第二个查询的效果不佳，而第一个查询的效果不出所料。我仍然会对为什么MySQL EXPLAIN提供这种相反的信息感到好奇。

Answer 1

我猜你的数据类型 city_id：MEDIUMINT 3字节质量：SMALLINT 2字节

据我所知，对于

  exports.onUserDeleted = functions.auth.user().onDelete((user) => {
    deleteCollection(admin.firestore(), user.uid, 15);
  });

  function deleteCollection(db, collectionPath, batchSize) {
    var collectionRef = db.collection(collectionPath);
    var query = collectionRef.orderBy('__name__').limit(batchSize);

    return new Promise((resolve, reject) => {
      deleteQueryBatch(db, query, batchSize, resolve, reject);
    });
  }

  function deleteQueryBatch(db, query, batchSize, resolve, reject) {
    query.get()
        .then((snapshot) => {
          // When there are no documents left, we are done
          if (snapshot.size === 0) {
            return 0;
          }

          // Delete documents in a batch
          var batch = db.batch();
          snapshot.docs.forEach((doc) => {
            batch.delete(doc.ref);
          });

          return batch.commit().then(() => {
            return snapshot.size;
          });
        }).then((numDeleted) => {
          if (numDeleted === 0) {
            resolve();
            return;
          }

          // Recurse on the next process tick, to avoid
          // exploding the stack.
          process.nextTick(() => {
            deleteQueryBatch(db, query, batchSize, resolve, reject);
          });
        })
        .catch(reject);
  }

第二个索引（质量，city_id）无法完全使用。因为“排序依据”是“范围扫描”，所以只能扫描索引的最后一部分。

第一个索引看起来非常合适。我猜有一段时间Mysql不太聪明。也许目标的city_id的数量可能会影响mysql决定将使用哪个索引。

您可以尝试输入关键字

SELECT * FROM user u WHERE u.city_id = 3205 ORDER BY u.quality DESC LIMIT 30;

Answer 2

EXPLAIN中的行只是对MySQL认为必须检查才能产生结果的行数的估计。

我记得读过Percona的Peter Zaitsev的一篇文章时说过，这个数字可能非常不准确。因此，您不能简单地根据此数字比较查询效率。

我同意您的看法，在正常情况下，第一个索引会产生更好的结果。

您应该已经注意到，第一个EXPLAIN中的type列是ref，而第二个EXPLAIN中的索引。 ref通常比索引扫描更好。如您所述，如果两个密钥都存在，则MySQL会优先选择第一个。

MySQL复合索引列的顺序和性能

2 个答案: