我有两个包含数百万行的MySQL表,我正在尝试从两个表中获取特定数据列的查询选择。尽管我的第一个好期望是查询选择的执行需要几秒钟(大约5秒),并且在WHERE条件上应用索引。
CREATE TABLE `T1` (
`T1_id` int(15) NOT NULL AUTO_INCREMENT,
`T1_val1` varchar(45) NOT NULL,
`T1_val2` varchar(45) NOT NULL,
`T1_val3` bigint(11) NOT NULL,
`T1_val4` datetime NOT NULL,
`T1_val5` varchar(100) NOT NULL,
`T1_val6` float NOT NULL,
`T1_val7` datetime NOT NULL,
`T1_val8` varchar(100) NOT NULL,
`T1_val9` varchar(100) NOT NULL,
`T1_val10` varchar(100) NOT NULL,
PRIMARY KEY (`T1_id`),
KEY `T1_val4` (`T1_val4`)
) ENGINE=InnoDB AUTO_INCREMENT=53885653 DEFAULT CHARSET=latin1;
CREATE TABLE `T2` (
`T2_id` int(11) NOT NULL,
`T2_val1` float NOT NULL,
`T2_val2` float NOT NULL,
`T2_val3` varchar(45) NOT NULL,
PRIMARY KEY (`T2_id`),
KEY `T2_val3` (`T2_val3`),
KEY `T2_val1_2` (`T2_val1`,`T2_val2`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
正如您所看到的,两个表格之间的{strong> AI主键和外键匹配为one-to-one relationship
(T1_id
和{{ 1}})。我们在T2_id
上为T1
应用日期时间格式的索引。
T1_val4
正如您所注意到的,我已为索引指定了一个HINT,以告诉MySQL将该特定索引用于datetime列。实际上,如果我将WHERE条件中的日期时间范围扩展到几个小时......例如SELECT
T1_val5,
T2_val1,
T2_val2,
T2_val3,
T1_val9,
count(T2_val1) as cnt,
T1_val4
FROM
T1 USE INDEX (T1_val4)
INNER JOIN T2
ON T1.T1_id = T2.T2_id
WHERE
T1_val4 BETWEEN '2016-02-18 15:00:00'
AND '2016-02-18 16:59:59'
GROUP BY
T2_val1,
T2_val2,
T2_val3,
T1_val9,
T1_val5
order by
T1_val4 ASC;
,执行时间会增长到50/100秒。可能我在逻辑中遗漏了一些东西。
BETWEEN '2016-02-18 15:00:00' AND '2016-02-18 23:59:59'
(正如@O.Jones所建议的那样)
+-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+
| ID | SELECT_TYPE | TABLE | TYPE | POSSIBLE_KEYS | KEY | KEY_LEN | REF | ROWS | EXTRA |
+-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+
| 1 | SIMPLE | T1 | range | T1_val4 | T1_val4 | 5 | NULL | 10670 | "Using index condition; Using temporary; Using filesort" |
+-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+
| 1 | SIMPLE | T2 | eq_ref | PRIMARY | PRIMARY | 4 | T1_id | 1 | NULL |
+-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+
ix_rlf 是+-------+---------------+-----------+-----------+------------------------------+-----------+---------------+-----------+-----------+---------------------------------------------------------------+
| ID | SELECT_TYPE | TABLE | TYPE | POSSIBLE_KEYS | KEY | KEY_LEN | REF | ROWS | EXTRA |
+-------+---------------+-----------+-----------+------------------------------+-----------+---------------+-----------+-----------+---------------------------------------------------------------+
| 1 | SIMPLE | T1 | range | "PRIMARY,ix_rlf" | ix_rlf | 5 | NULL | 10906 | "Using where; Using index; Using temporary; Using filesort" |
+-------+---------------+-----------+-----------+------------------------------+-----------+---------------+-----------+-----------+---------------------------------------------------------------+
| 1 | SIMPLE | T2 | eq_ref | "PRIMARY,ix_cc" | PRIMARY | 4 | T1_id | 1 | NULL |
+-------+---------------+-----------+-----------+------------------------------+-----------+---------------+-----------+-----------+---------------------------------------------------------------+
,T1_val4
,T1_val9
的复合索引, ix_cc 是@Tom建议的复合索引T2的Shir由T1_val5
,T2_id
,T2_val1
,T2_val2
组成。
(考虑2小时为间隔,在这种情况下,查询结果为6632行,6/7秒为执行时间)
答案 0 :(得分:1)
从你的T1_val4
条款中省略GROUP BY
,你正在利用MySQL的非标准扩展。您可能会得到不受欢迎的结果。请阅读这个。 https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
通常,将BETWEEN
用于类似日期时间的列是一个坏主意,因为它很难处理范围结束条件。如果我是你,我会写这个
WHERE T1_val4 >= '2016-02-18 15:00:00' AND T1_val4 < '2016-02-18 17:00:00'
您有正确的想法索引您的datestamp列。您可以尝试使用compound covering index而不是简单的日期戳索引。看起来您的查询从T1
中提取了大约一万行,因此覆盖索引的复合将有所帮助。将所需的所有列放在索引中,首先放置范围扫描列。这意味着MySQL可以通过索引范围扫描来满足整个查询,这样更快。索引应该在这些列上。
T1_val4, T1_val9, T1_val5
因为您正在使用InnoDB,所以您不必在复合索引中包含主键。
这应该快一点。但是,你仍然要求MySQL检索和索引一万行,这实际上是真正的工作。
答案 1 :(得分:1)
这是您使用表格前缀的查询:
SELECT
T1.T1_val5,
T2.T2_val1,
T2.T2_val2,
T2.T2_val3,
T1.T1_val9,
COUNT(T2.T2_val1) AS cnt,
T1.T1_val4
FROM
T1
INNER JOIN
T2.T2
ON T1.T1_id = T2.T2_id
WHERE
T1.T1_val4 BETWEEN '2016-02-18 15:00:00' AND '2016-02-18 16:59:59'
GROUP BY
T2.T2_val1,
T2.T2_val2,
T2.T2_val3,
T1.T1_val9,
T1.T1_val5
ORDER BY
T1.T1_val4 ASC
我相信您可以使用正确的索引来提高其性能。 我通过SQL query optimizer运行您的查询,我正在使用我自己的查询,建议使用这些索引:
ALTER TABLE `T1` ADD INDEX `T1_index_1` (`T1_id`, `T1_val4`);
ALTER TABLE `T2` ADD INDEX `T2_index_1` (`T2_id`, `T2_val1`, `T2_val2`, `T2_val3`);
另外,请发布解释计划,因为它可以帮助更好地了解MySQL当前使用的索引。
另一个建议 - 删除您添加的提示。通常,MySQL会比我们更了解如何优化查询。