Question

我是MySQL的新手，并试图在项目中使用MySQL，基本上是跟踪玩家的表现。下面是表格字段。

+-------------------+----------------------+-------------------+------+-----+---------+----------------+---------------------------------+---------+
| Field             | Type                 | Collation         | Null | Key | Default | Extra          | Privileges                      | Comment |
+-------------------+----------------------+-------------------+------+-----+---------+----------------+---------------------------------+---------+
| unique_id         | int(11)              | NULL              | NO   | PRI | NULL    | auto_increment | select,insert,update,references |         |
| record_time       | datetime             | NULL              | NO   |     | NULL    |                | select,insert,update,references |         |
| game_sourceid     | char(20)             | latin1_swedish_ci | NO   | MUL | NULL    |                | select,insert,update,references |         |
| game_number       | smallint(6)          | NULL              | NO   |     | NULL    |                | select,insert,update,references |         |
| game_difficulty   | char(12)             | latin1_swedish_ci | NO   | MUL | NULL    |                | select,insert,update,references |         |
| cost_time         | smallint(5) unsigned | NULL              | NO   | MUL | NULL    |                | select,insert,update,references |         |
| country           | char(3)              | latin1_swedish_ci | NO   |     | NULL    |                | select,insert,update,references |         |
| source            | char(7)              | latin1_swedish_ci | NO   |     | NULL    |                | select,insert,update,references |         |
+-------------------+----------------------+-------------------+------+-----+---------+----------------+---------------------------------+---------+

我添加game_sourceid和game_difficulty作为索引，引擎是innodb。

我在这个表中插入了大约11m行的测试数据，这些数据是随机生成的，但与实际数据类似。

基本上，大多数查询都是这样的，以获得特定game_sourceid的平均时间和最佳时间

SELECT avg(cost_time) AS avgtime
    , min(cost_time) AS mintime
    , count(*) AS count
FROM statistics_work_table
WHERE game_sourceid = 'standard_easy_1';

+-----------+---------+--------+
| avgtime   | mintime | count  |
+-----------+---------+--------+
| 1681.2851 |     420 | 138034 |
+-----------+---------+--------+
1 row in set (4.97 sec)

并且查询花了大约5s

我已经搜索过这个，有人说这可能是由查询计数引起的，所以我试图缩小这样的范围

SELECT avg(cost_time) AS avgtime
    , min(cost_time) AS mintime
    , count(*) AS count
FROM statistics_work_table
WHERE game_sourceid = 'standard_easy_1'
    AND record_time > '2015-11-19 04:40:00';

+-----------+---------+-------+
| avgtime   | mintime | count |
+-----------+---------+-------+
| 1275.2222 |     214 |     9 |
+-----------+---------+-------+

1 row in set (4.46 sec)

正如你所看到的那样，9行数据也需要大约5s，所以我认为这不是查询计数的问题。

随机生成测试数据以模拟真实用户的活动，因此数据不连续，因此我使用相同 game_sourceid='standard_easy_9'添加了更多连续数据（约250k）但保留所有数据其他随机的，换句话说，此表中的最后250k行具有相同的game_sourceid。而我正试图这样查询：

SELECT avg(cost_time) AS avgtime
    , min(cost_time) AS mintime
    , count(*) AS count
FROM statistics_work_table
WHERE game_sourceid = 'standard_easy_9';

+-----------+---------+--------+
| avgtime   | mintime | count  |
+-----------+---------+--------+
| 1271.4806 |      70 | 259379 |
+-----------+---------+--------+
1 row in set (0.40 sec)

这次查询神奇地只花了0.4秒，这完全超出了我的预期。

所以这就是问题，数据是实时从玩家那里重新获得的，所以它必须是随机的，不连续的。

我正在考虑通过game_sourceid将数据分成多个表，但是它将需要另外80个表，将来可能更多。

由于我是MySQL的新手，我想知道是否有其他解决方案，或者只是我的查询太糟糕了。

更新：这是我的表的索引

mysql> show index from statistics_work_table;

+-----------------------+------------+-------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table                 | Non_unique | Key_name                | Seq_in_index | Column_name     | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------------+------------+-------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| statistics_work_table |          0 | PRIMARY                 |            1 | unique_id       | A         |    11362113 |     NULL | NULL   |      | BTREE      |         |               |
| statistics_work_table |          1 | GameSourceId_CostTime   |            1 | game_sourceid   | A         |          18 |     NULL | NULL   |      | BTREE      |         |               |
| statistics_work_table |          1 | GameSourceId_CostTime   |            2 | cost_time       | A         |      344306 |     NULL | NULL   |      | BTREE      |         |               |
+-----------------------+------------+-------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

Answer 1

ALTER TABLE `statistics_work_table`
ADD INDEX `GameSourceId_CostTime` (`game_sourceid`,`cost_time`)

此索引应该使您的查询超快。此外，在运行上述语句后，您应该删除game_sourceid上的单列索引，因为上面的内容将使单列成为冗余。（这会影响插入速度。）

您的查询速度慢的原因是因为数据库正在game_sourceid上使用您的单列索引，找到行，然后，对于每一行，使用与索引一起存储的主键来查找主聚集索引（在此情况下称为主键，大多数情况下），然后查找cost_time值。这被称为双重查找，它是您想要避免的。

我在上面提供的索引称为“覆盖索引”。它允许您的查询仅使用索引，因此每行只需要一次查找，从而大大提高了性能。

对于不连续数据，mysql查询运行速度太慢

1 个答案: