Question

表有1 500 000条记录，其中1 250 000条记录为field ='z' 我需要选择随机而不是'z'字段。

$random = mt_rand(1, 250000);  
$query = "SELECT field FROM table WHERE field != 'z' LIMIT $random, 1";

工作正常。

然后我决定优化它并在表格中索引field。

结果很奇怪 - 慢~3次。我测试了它。
它为什么慢？不是这样的索引应该让它更快吗？

我的ISAM

explain with index:  
id  select_type  table  type  possible_keys  key   key_len  ref  rows     Extra  
1   SIMPLE       table  range field          field 758      NULL 1139287  Using  

explain without index:  
id  select_type  table  type  possible_keys  key  key_len  ref  rows     Extra  
1   SIMPLE       table  ALL   NULL           NULL NULL     NULL 1484672  Using where

Answer 1

<强>摘要

由于b-trees的性质，问题是field不适合编制索引。

<强>解释

假设你有一张表有500,000次掷硬币的结果，其中折腾为1（头）或0（尾巴）：

CREATE TABLE toss (
    id int NOT NULL AUTO_INCREMENT,
    result int NOT NULL DEFAULT '0',
    PRIMARY KEY ( id )
)

select result, count(*) from toss group by result order by result;
+--------+----------+
| result | count(*) |
+--------+----------+
|      0 |   250290 |
|      1 |   249710 |
+--------+----------+
2 rows in set (0.40 sec)

如果你想选择一个折腾（随机）抛出尾巴的地方，那么你需要搜索你的桌子，选择一个随机的起始位置。

select * from toss where result != 1 limit 123456, 1;
+--------+--------+
| id     | result |
+--------+--------+
| 246700 |      0 |
+--------+--------+
1 row in set (0.06 sec)

explain select * from toss where result != 1 limit 123456, 1;
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | toss  | ALL  | NULL          | NULL | NULL    | NULL | 500000 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+

您会发现基本上是按顺序搜索所有行以找到匹配项。

如果您在toss字段上创建索引，那么您的索引将包含两个值，每个值大约有250,000个条目。

create index foo on toss ( result );
Query OK, 500000 rows affected (2.48 sec)
Records: 500000  Duplicates: 0  Warnings: 0

select * from toss where result != 1 limit 123456, 1;
+--------+--------+
| id     | result |
+--------+--------+
| 246700 |      0 |
+--------+--------+
1 row in set (0.25 sec)

explain select * from toss where result != 1 limit 123456, 1;
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | toss  | range | foo           | foo  | 4       | NULL | 154565 | Using where |
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+

现在您搜索的记录较少，但搜索时间从0.06增加到0.25秒。为什么？因为顺序扫描索引实际上比顺序扫描表的效率低，对于给定键具有大量行的索引。

让我们看看这个表上的索引：

show index from toss;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| toss  |          0 | PRIMARY  |            1 | id          | A         |      500000 |     NULL | NULL   |      | BTREE      |         |
| toss  |          1 | foo      |            1 | result      | A         |           2 |     NULL | NULL   |      | BTREE      |         |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

PRIMARY索引是一个很好的索引：有500,000行，有500,000个值。安排在BTREE中，您可以根据ID快速识别单行。

foo索引是一个坏索引：有500,000行，但只有2个可能的值。这几乎是BTREE最糟糕的情况 - 搜索索引的所有开销，仍然需要搜索结果。

Answer 2

如果没有order by子句，则LIMIT $random, 1从某个未定义的位置开始。

根据你的解释，该指数甚至没有被使用。

为什么索引使这个查询更慢？

2 个答案: