我在MySQL中有以下表格:
CREATE TABLE `events` (
`pv_name` varchar(60) COLLATE utf8mb4_bin NOT NULL,
`time_stamp` bigint(20) unsigned NOT NULL,
`event_type` varchar(40) COLLATE utf8mb4_bin NOT NULL,
`has_data` tinyint(1) NOT NULL,
`data` json DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin ROW_FORMAT=COMPRESSED;
ALTER TABLE `events`
ADD PRIMARY KEY (`pv_name`,`time_stamp`),
ADD UNIQUE KEY `has_data` (`pv_name`,`has_data`,`time_stamp`);
我正在尝试找到一组不同的pv_names,这些pv_names在两个给定时间之间没有数据。以下两个查询似乎都会返回此信息:
mysql> EXPLAIN SELECT pv_name FROM events
WHERE has_data = 0
AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999
GROUP BY events.pv_name;
+----+-------------+--------+------------+-------+------------------+----------+---------+------+---------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+------------------+----------+---------+------+---------+----------+--------------------------+
| 1 | SIMPLE | events | NULL | index | PRIMARY,has_data | has_data | 251 | NULL | 1855281 | 1.11 | Using where; Using index |
+----+-------------+--------+------------+-------+------------------+----------+---------+------+---------+----------+--------------------------+
mysql> EXPLAIN SELECT pv_name, MAX(events.time_stamp) FROM events
WHERE has_data = 0
AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999
GROUP BY events.pv_name;
+----+-------------+--------+------------+-------+------------------+----------+---------+------+--------+----------+---------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+------------------+----------+---------+------+--------+----------+---------------------------------------+
| 1 | SIMPLE | events | NULL | range | PRIMARY,has_data | has_data | 251 | NULL | 203123 | 100.00 | Using where; Using index for group-by |
+----+-------------+--------+------------+-------+------------------+----------+---------+------+--------+----------+---------------------------------------+
我不明白为什么第二个查询对它返回的内容(我不需要)有额外的限制,似乎比第一个查询的运行时间短。有没有办法在time_stamp
列上没有聚合的情况下改进第一个查询以匹配第二个查询的效率?
编辑:
Per Rick James的建议我更改了has_data
索引:
ALTER TABLE `events`
ADD PRIMARY KEY (`pv_name`,`time_stamp`), ADD KEY `has_data` (`has_data`,`pv_name`,`time_stamp`);
这将查询报告更改为:
mysql> EXPLAIN SELECT pv_name FROM events WHERE has_data = 0 AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999 GROUP BY events.pv_name;
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
| 1 | SIMPLE | events | NULL | ref | PRIMARY,has_data | has_data | 1 | const | 267096 | 11.11 | Using where; Using index |
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT pv_name, MAX(events.time_stamp) FROM events WHERE has_data = 0 AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999 GROUP BY events.pv_name;
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
| 1 | SIMPLE | events | NULL | ref | PRIMARY,has_data | has_data | 1 | const | 267096 | 11.11 | Using where; Using index |
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
1 row in set, 1 warning (0.01 sec)
这似乎运行得更快。
编辑:
Rick James要求的测试结果:
mysql> FLUSH STATUS;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT pv_name FROM events WHERE has_data = 0 AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999 GROUP BY events.pv_name;
.
.
.
114480 rows in set (0.34 sec)
mysql> SHOW SESSION STATUS LIKE 'Handler%';
+----------------------------+--------+
| Variable_name | Value |
+----------------------------+--------+
| Handler_commit | 1 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 2 |
| Handler_mrr_init | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 0 |
| Handler_read_key | 1 |
| Handler_read_last | 0 |
| Handler_read_next | 125527 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 0 |
| Handler_rollback | 0 |
| Handler_savepoint | 0 |
| Handler_savepoint_rollback | 0 |
| Handler_update | 0 |
| Handler_write | 0 |
+----------------------------+--------+
18 rows in set (0.01 sec)
mysql> SELECT COUNT(*) FROM events;
+----------+
| COUNT(*) |
+----------+
| 3683887 |
+----------+
1 row in set (11.66 sec)
编辑:
跑步时间:
mysql> SHOW INDEXES FROM events;
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| events | 0 | PRIMARY | 1 | pv_name | A | 216061 | NULL | NULL | | BTREE | | |
| events | 0 | PRIMARY | 2 | time_stamp | A | 4450791 | NULL | NULL | | BTREE | | |
| events | 1 | has_data | 1 | has_data | A | 258 | NULL | NULL | | BTREE | | |
| events | 1 | has_data | 2 | pv_name | A | 496542 | NULL | NULL | | BTREE | | |
| events | 1 | has_data | 3 | time_stamp | A | 4390035 | NULL | NULL | | BTREE | | |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
5 rows in set (0.00 sec)
mysql> EXPLAIN SELECT events.pv_name FROM events WHERE has_data = 0 AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999 GROUP BY events.pv_name;
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
| 1 | SIMPLE | events | NULL | ref | PRIMARY,has_data | has_data | 1 | const | 267096 | 11.11 | Using where; Using index |
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT events.pv_name, MAX(events.time_stamp) FROM events WHERE has_data = 0 AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999 GROUP BY events.pv_name;
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
| 1 | SIMPLE | events | NULL | ref | PRIMARY,has_data | has_data | 1 | const | 267096 | 11.11 | Using where; Using index |
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
SELECT events.pv_name FROM events WHERE has_data = 0 AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999 GROUP BY events.pv_name;
114480 rows in set (0.37 sec)
SELECT events.pv_name, MAX(events.time_stamp) FROM events WHERE has_data = 0 AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999 GROUP BY events.pv_name;
114480 rows in set (0.30 sec)
mysql> SHOW INDEXES FROM events;
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| events | 0 | PRIMARY | 1 | pv_name | A | 422951 | NULL | NULL | | BTREE | | |
| events | 0 | PRIMARY | 2 | time_stamp | A | 4321990 | NULL | NULL | | BTREE | | |
| events | 0 | has_data | 1 | pv_name | A | 240067 | NULL | NULL | | BTREE | | |
| events | 0 | has_data | 2 | has_data | A | 436525 | NULL | NULL | | BTREE | | |
| events | 0 | has_data | 3 | time_stamp | A | 4205163 | NULL | NULL | | BTREE | | |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
5 rows in set (0.00 sec)
mysql> EXPLAIN SELECT events.pv_name FROM events WHERE has_data = 0 AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999 GROUP BY events.pv_name;
+----+-------------+--------+------------+-------+------------------+----------+---------+------+---------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+------------------+----------+---------+------+---------+----------+--------------------------+
| 1 | SIMPLE | events | NULL | index | PRIMARY,has_data | has_data | 251 | NULL | 4462633 | 1.11 | Using where; Using index |
+----+-------------+--------+------------+-------+------------------+----------+---------+------+---------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT events.pv_name, MAX(events.time_stamp) FROM events WHERE has_data = 0 AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999 GROUP BY events.pv_name;
+----+-------------+--------+------------+-------+------------------+----------+---------+------+--------+----------+---------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+------------------+----------+---------+------+--------+----------+---------------------------------------+
| 1 | SIMPLE | events | NULL | range | PRIMARY,has_data | has_data | 251 | NULL | 240076 | 100.00 | Using where; Using index for group-by |
+----+-------------+--------+------------+-------+------------------+----------+---------+------+--------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)
SELECT events.pv_name FROM events WHERE has_data = 0 AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999 GROUP BY events.pv_name;
114480 rows in set (6.79 sec)
SELECT events.pv_name, MAX(events.time_stamp) FROM events WHERE has_data = 0 AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999 GROUP BY events.pv_name;
114480 rows in set (2.65 sec)
答案 0 :(得分:1)
根据[文档](Google.JarResolver.ResolutionException: Cannot resolve com.google.firebase:firebase-analytics-unity:1.0.0()进行松散索引扫描):
索引的任何其他部分都不是查询中引用的GROUP BY的部分必须是常量(也就是说,它们必须以与常量相等的方式引用),除了MIN()或MAX的参数( )功能。
在第一个查询中,引用了time_stamp但不是常量。在第二个查询中,time_stamp也在MAX()的参数中。因此,在这种情况下,松散的索引扫描适用。
答案 1 :(得分:0)
将UNIQUE
替换为
INDEX(has_data, pv_name, time_stamp) -- in this order
除非您需要约束,否则通常最好不要创建索引UNIQUE
。在这种情况下,您已经限制了子集(pv_name, time_stamp)
。
构建索引时,请从任何=
列(has_data
)开始。这允许其余的处理集中在必要的数据上,而不是在has_data
的不良值上绊倒。最后放置一个范围(time_stamp
),因为可以使用超出范围的任何(通常)。在索引中包含这三列可以为您提供一个&#34;覆盖&#34;索引,所以EXPLAIN
应该说&#34;使用索引&#34;。
我建议的索引应该有助于两个查询。
答案 2 :(得分:-1)
在某些特定条件下,可以优化分组。那就是第二个查询中发生的事情。优化称为松散表索引扫描(see MySQL-Documentation)
如果你在第一个查询中使用DISTINCT而不是group by,也许这也会有效?或者您可以在文档中查看如何通过优化第一个查询来实现该组。
松散索引扫描
处理GROUP BY的最有效方法是使用索引直接检索分组列。使用此访问方法,MySQL使用某些索引类型的属性(按键排序)(例如,BTREE)。此属性允许在索引中使用查找组,而无需考虑索引中满足所有WHERE条件的所有键。此访问方法仅考虑索引中的一小部分键,因此称为松散索引扫描。当没有WHERE子句时,松散索引扫描会读取与组数一样多的密钥,这可能比所有密钥的数量小得多。如果WHERE子句包含范围谓词(请参见第9.8.1节“使用EXPLAIN优化查询”中的范围连接类型的讨论),松散索引扫描会查找满足范围条件的每个组的第一个键,并再次读取尽可能少的键。这可以在以下条件下进行:
- 查询在一个表上。
- GROUP BY只列出构成索引最左边前缀而不包含其他列的列。 (如果查询具有DISTINCT子句而不是GROUP BY,则所有不同的属性引用形成索引的最左前缀的列。)例如,如果表t1具有(c1,c2,c3)上的索引,如果查询具有GROUP BY c1,c2,则松散索引扫描适用。如果查询具有GROUP BY c2,c3(列不是最左边的前缀)或GROUP BY c1,c2,c4(c4不在索引中),则不适用。
- 选择列表中使用的唯一聚合函数(如果有)是MIN()和MAX(),并且它们都引用同一列。该列必须位于索引中,并且必须紧跟GROUP BY中的列。
- 除了MIN()或MAX()函数的参数之外,索引中除查询中引用的GROUP BY之外的任何其他部分必须是常量(即,它们必须以与常量相等的方式引用)。
对于索引中的列,必须索引完整列值,而不仅仅是前缀。例如,对于
c1 VARCHAR(20), INDEX (c1(10))
,索引不能用于松散索引扫描。 如果松散索引扫描适用于查询,则EXPLAIN输出在Extra列中显示Using for group-by。
希望这有帮助