可以使此子查询使用索引吗?

时间:2019-02-17 09:58:23

标签: mysql mysql-5.6

首先,对文本墙表示歉意。我确实通读了所有可能找到的类似问题/答案,但是答案似乎不适用于我的查询,或者我需要更加清楚地了解潜在的问题和解决方案。

我有一张文件大小表以及相关的文件日期和观察时间戳。所有日期都是UNIX纪元时间整数,以秒为单位:

mysql> describe name_servers;
+-----------------------+------------------+------+-----+---------+----------------+
| Field                 | Type             | Null | Key | Default | Extra          |
+-----------------------+------------------+------+-----+---------+----------------+
| server_name           | varchar(255)     | YES  |     | NULL    |                |
| file_date             | int(10) unsigned | YES  |     | NULL    |                |
| file_size             | int(10) unsigned | YES  |     | NULL    |                |
| time                  | int(10) unsigned | YES  | MUL | NULL    |                |
| poll_id               | int(11)          | NO   | PRI | NULL    | auto_increment |
+-----------------------+------------------+------+-----+---------+----------------+
5 rows in set (0.01 sec)


mysql> show index from name_servers;
+--------------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table        | Non_unique | Key_name                 | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| name_servers |          0 | PRIMARY                  |            1 | poll_id     | A         |     3523218 |     NULL | NULL   |      | BTREE      |         |               |
| name_servers |          0 | index_time_servername    |            1 | time        | A         |      503316 |     NULL | NULL   | YES  | BTREE      |         |               |
| name_servers |          0 | index_time_servername    |            2 | server_name | A         |     3523218 |     NULL | NULL   | YES  | BTREE      |         |               |
+--------------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)

我必须跟踪文件大小的变化,以检测文件在任何48小时内是否缩小了20%以上。通常,我会尝试使用MySQL Window函数来执行此操作,但是我的服务器上的MySQL版本不支持它们(5.6.37,我无法控制,因为服务器不受我的团队管理)。目前,我通过外部查询(在当前行中找到文件大小)和内部子查询(在过去48小时内)获得了当前大小和最大大小(过去48小时),该子查询在过去48小时(172,800秒)中找到了最大文件大小)的行数:

mysql> select name_servers_outside.server_name,
    -> name_servers_outside.file_size,
    -> name_servers_outside.file_date,
    -> name_servers_outside.time,
    -> (select max(file_size) from name_servers where time > (name_servers_outside.time - 172800) and server_name = 'example_server') as max_file_size
    -> from name_servers as name_servers_outside
    -> where name_servers_outside.server_name = 'example_server'
    -> and name_servers_outside.time > (UNIX_TIMESTAMP() - 172800)
    -> limit 10;
+-------------------+-------------------+-------------------+------------+-----------------------+
| server_name       | file_size         | file_date         | time       | max_file_size         |
+-------------------+-------------------+-------------------+------------+-----------------------+
| example_server    |           1159544 |        1550382945 | 1550382985 |               1159580 |
| example_server    |           1159544 |        1550382945 | 1550383195 |               1159580 |
| example_server    |           1159544 |        1550382945 | 1550383255 |               1159580 |
| example_server    |           1159544 |        1550382945 | 1550383316 |               1159580 |
| example_server    |           1159544 |        1550382945 | 1550383376 |               1159580 |
| example_server    |           1159544 |        1550382945 | 1550383435 |               1159580 |
| example_server    |           1159544 |        1550382945 | 1550383496 |               1159580 |
| example_server    |           1159544 |        1550382945 | 1550383555 |               1159580 |
| example_server    |           1159544 |        1550382945 | 1550383616 |               1159580 |
| example_server    |           1159544 |        1550382945 | 1550383676 |               1159580 |
+-------------------+-------------------+-------------------+------------+-----------------------+
10 rows in set (16.11 sec)

仅检索这10行需要16秒,而在生产中,此查询将必须检索150多行。内部查询正在对所有300万以上的表行进行完整扫描,并显示消息“检查了每条记录的范围(索引映射:0x2)”:

mysql> explain
    -> select name_servers_outside.server_name,
    -> name_servers_outside.file_size,
    -> name_servers_outside.file_date,
    -> name_servers_outside.time,
    -> (select max(file_size) from name_servers where time > (name_servers_outside.time - 172800) and server_name = 'example_server') as max_file_size
    -> from name_servers as name_servers_outside
    -> where name_servers_outside.server_name = 'example_server'
    -> and name_servers_outside.time > (UNIX_TIMESTAMP() - 172800);
+----+--------------------+----------------------+-------+--------------------------+--------------------------+---------+------+---------+------------------------------------------------+
| id | select_type        | table                | type  | possible_keys            | key                      | key_len | ref  | rows    | Extra                                          |
+----+--------------------+----------------------+-------+--------------------------+--------------------------+---------+------+---------+------------------------------------------------+
|  1 | PRIMARY            | name_servers_outside | range | index_time_servername    | index_time_servername    | 5       | NULL |   47302 | Using index condition; Using MRR               |
|  2 | DEPENDENT SUBQUERY | name_servers         | ALL   | index_time_servername    | NULL                     | NULL    | NULL | 3533883 | Range checked for each record (index map: 0x2) |
+----+--------------------+----------------------+-------+--------------------------+--------------------------+---------+------+---------+------------------------------------------------+
2 rows in set (0.01 sec)

有问题的部分似乎是这样:

time > (name_servers_outside.time - 172800)

如果我使用静态整数值而不是子查询中的“ name_servers_outside.time”列引用来运行类似的查询,则按预期使用索引,并且查询速度很快:

time > (UNIX_TIMESTAMP() - 172800)

修改后的查询:

mysql> select name_servers_outside.server_name,
    -> name_servers_outside.file_size,
    -> name_servers_outside.file_date,
    -> name_servers_outside.time,
    -> (select max(file_size) from name_servers where time > (UNIX_TIMESTAMP() - 172800) and server_name = 'example_server') as max_file_size
    -> from name_servers as name_servers_outside
    -> where name_servers_outside.server_name = 'example_server'
    -> and name_servers_outside.time > (UNIX_TIMESTAMP() - 172800)
    -> limit 10;
+--------------------+-------------------+-------------------+------------+-----------------------+
| server_name        | file_size         | file_date         | time       | max_file_size         |
+--------------------+-------------------+-------------------+------------+-----------------------+
| example_server     |           1159544 |        1550382945 | 1550382985 |               1159580 |
| example_server     |           1159544 |        1550382945 | 1550383195 |               1159580 |
| example_server     |           1159544 |        1550382945 | 1550383255 |               1159580 |
| example_server     |           1159544 |        1550382945 | 1550383316 |               1159580 |
| example_server     |           1159544 |        1550382945 | 1550383376 |               1159580 |
| example_server     |           1159544 |        1550382945 | 1550383435 |               1159580 |
| example_server     |           1159544 |        1550382945 | 1550383496 |               1159580 |
| example_server     |           1159544 |        1550382945 | 1550383555 |               1159580 |
| example_server     |           1159544 |        1550382945 | 1550383616 |               1159580 |
| example_server     |           1159544 |        1550382945 | 1550383676 |               1159580 |
+--------------------+-------------------+-------------------+------------+-----------------------+
10 rows in set (0.01 sec)


mysql> explain
    -> select name_servers_outside.server_name,
    -> name_servers_outside.file_size,
    -> name_servers_outside.file_date,
    -> name_servers_outside.time,
    -> (select max(file_size) from name_servers where time > (UNIX_TIMESTAMP() - 172800) and server_name = 'example_server') as max_file_size
    -> from name_servers as name_servers_outside
    -> where name_servers_outside.server_name = 'example_server'
    -> and name_servers_outside.time > (UNIX_TIMESTAMP() - 172800)
    -> limit 10;
+----+-------------+----------------------+-------+--------------------------+--------------------------+---------+------+-------+----------------------------------+
| id | select_type | table                | type  | possible_keys            | key                      | key_len | ref  | rows  | Extra                            |
+----+-------------+----------------------+-------+--------------------------+--------------------------+---------+------+-------+----------------------------------+
|  1 | PRIMARY     | name_servers_outside | range | index_time_servername    | index_time_servername    | 5       | NULL | 49042 | Using index condition; Using MRR |
|  2 | SUBQUERY    | name_servers         | range | index_time_servername    | index_time_servername    | 5       | NULL | 49042 | Using index condition; Using MRR |
+----+-------------+----------------------+-------+--------------------------+--------------------------+---------+------+-------+----------------------------------+
2 rows in set (0.00 sec)

感谢您到目前为止与我一起阅读。我再次为巨大的文字墙表示歉意,但我想确保我提供了足够的解释性细节来清楚地定义问题。

现在,我要解决的问题是我需要在每行前48小时内检索file_size的最大值。因此,每一行都有自己的唯一时间范围用于“ max(file_size)”计算。然后将用于计算文件大小变化的百分比。如上所述,我通常要为此使用窗口函数,但是我的MySQL版本(5.6.37)不支持它们,并且由于我不拥有此服务器,因此无法将其更新为8.0。 / p>

一如既往,任何建议都值得赞赏。谢谢您的阅读!

2 个答案:

答案 0 :(得分:1)

我将首先尝试将file_size添加到index_time_servername索引中,但我怀疑真正的问题是您必须在子查询中使用name_servers_outside.time,因为其别名不同,这可能会使查询计划程序混乱。

那么,如何丢失子查询并将表连接到时间在48小时之间的时间呢?

类似...

SELECT
  name_servers_outside.server_name,
  name_servers_outside.file_size,
  name_servers_outside.file_date,
  name_servers_outside.time,
  MAX(previous.file_size) AS max_file_size
FROM
   name_servers AS ns
INNER JOIN name_servers AS previous 
   ON previous.time BETWEEN (ns.time - 172800) AND (ns.time - 1)
WHERE 
   ns.server_name = 'example_server'
   AND ns.time > (UNIX_TIMESTAMP() - 172800)
GROUP BY
   ns.server_name,
   ns.file_size,
   ns.file_date,
   ns.time
LIMIT 10;

答案 1 :(得分:0)

我为延迟答复表示歉意;该解决方案最终涉及多个组件,并且需要花费一些时间来进行测试。

我要解决的主要问题是查询性能之一。严格来说,我的原始查询返回了预期的数据,但是花了很长时间才完成,因此不切实际。因此,解决方案就是寻找尽可能多的方法来减少执行时间。

这是解决方案最终需要解决的问题:

  1. 根据Dazz Knowles的建议,我用内部联接替换了子查询,这清理了代码并使其更易于理解。
  2. 按照Progman的建议,我在“ server_name”字段上将索引更改为单列索引。
  3. 我将此查询中涉及的行移到了自己的表中,从而简化了列的工作集。
  4. 我将向表中写入行的应用程序的采样率从每分钟1个数据点(1行)减少到每小时1个数据点(1行),从而将行的工作集减少到以前的1/60数量。 1-4的综合作用使查询执行时间从几分钟缩短到了几毫秒。
  5. 我以前曾尝试在运行时计算“ max_file_size”,应用程序客户端将查询同时提交给MySQL服务器,以获取约100个不同的服务器和每个服务器上的3个不同的文件(每次运行约300个查询实例)应用程序已刷新)。这将MySQL服务器的CPU保持在100%,因此在实际应用中不切实际,尤其是当多个最终用户同时使用客户端应用程序时。我改为只从服务器端脚本运行查询,并且仅在插入新行时才运行。因此查询每小时运行一次,在几毫秒内计算出约300个max_file_size值。然后,它将max_file_size作为静态列写入MySQL表。 max_file_size所依赖的值都不应该改变,因此我不必担心一旦为特定行写入了max_file_size即可再次运行查询以更新它。现在,应用程序的客户端仅从MySQL读取数据。它不再尝试发送查询来计算max_file_size。借助事后观察,似乎这种方法从一开始就应该很明显,但是有时您必须首先做错了事,才能了解正确方法的正确之处。