我有两个表都是相同的,除了一个有时间戳值列,另一个有日期时间值列。索引是一样的。值是相同的。
但是当我运行SELECT station, MAX(timestamp) AS max_timestamp FROM stations GROUP BY station;
如果站点是带有时间戳的站点时,它执行速度非常快,如果我尝试使用日期时间,那么我还没有看到一个查询执行。在这两种情况下,timestamp
列都已编制索引,只有类型更改。
我应该从哪里开始寻找?或者datetime不适合搜索和索引?
以下是EXPLAIN
给出的内容:
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| 1 | SIMPLE | stations | range | NULL | stamp | 33 | NULL | 1511 | Using index for group-by |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------+
| 1 | SIMPLE |stations2 | index | NULL | station | 2 | NULL | 3025467 | |
+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------+
SHOW
:
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| stations | CREATE TABLE `stations` (
`station` varchar(10) COLLATE utf8_bin DEFAULT NULL,
`available` smallint(6) DEFAULT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
UNIQUE KEY `stamp` (`station`,`timestamp`),
KEY `time` (`timestamp`),
KEY `timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| stations2 | CREATE TABLE `stations2` (
`station` smallint(5) unsigned NOT NULL,
`available` smallint(5) unsigned DEFAULT NULL,
`timestamp` datetime DEFAULT NULL,
KEY `station` (`station`),
KEY `timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin |
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
答案 0 :(得分:1)
您可以从EXPLAIN中看到没有用于选择的键(对于possible_keys为NULL)。你没有WHERE子句,所以这是有道理的。
MySQL可以利用索引来确定MAX,它可以利用索引来优化GROUP BY。但是,为了能够优化两者的组合,您需要MAX()函数中的列和GROUP BY子句中的列都在复合索引中。在第一个表中,您将此复合索引作为名为“stamp”的唯一键。 EXPLAIN结果显示MySQL正在使用该索引。
在第二个表中,您没有此复合索引,因此MySQL必须执行更多工作。它必须手动对结果进行分组,并通过手动扫描每一行来保持每个站的MAX值。如果在第二个表上添加相同的复合索引,您将看到两者之间的相似性能。
然而,TIMESTAMP仍将略微优于DATETIME,因为TIMESTAMP被视为单个4字节整数值,其处理速度比8字节特殊DATETIME值更快。数据集越大,您将看到的差异越大。