表结构是:
CREATE TABLE `test` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`from` int(10) unsigned NOT NULL,
`to` int(10) unsigned NOT NULL,
`message` text NOT NULL,
`sent` int(10) unsigned NOT NULL DEFAULT '0',
`read` tinyint(1) unsigned NOT NULL DEFAULT '0',
`direction` tinyint(1) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `one` (`to`,`direction`,`from`,`id`),
KEY `two` (`from`,`direction`,`to`,`id`),
KEY `three` (`read`,`direction`,`to`),
KEY `four` (`read`,`direction`,`from`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
我有一个奇怪的问题。请查看以下查询:
select test.id, test.from, test.to, test.message, test.sent, test.read, test.direction from test
where (
(test.to = 244975 and test.direction <> 2 and test.direction <> 3 and
(
(test.from = 204177 and test.id > 5341203) OR
(test.from = 214518 and test.id > 5336549) OR
(test.from = 231429 and test.id > 5338284) OR
(test.from = 242739 and test.id > 5339541) OR
(test.from = 243834 and test.id > 5340438) OR
(test.from = 244354 and test.id > 5337489) OR
(test.from = 244644 and test.id > 5338572) OR
(test.from = 244690 and test.id > 5338467)
)
)
or
(test.from = 244975 and test.direction <> 1 and test.direction <> 3 and
(
(test.to = 204177 and test.id > 5341203) OR
(test.to = 214518 and test.id > 5336549) OR
(test.to = 231429 and test.id > 5338284) OR
(test.to = 242739 and test.id > 5339541) OR
(test.to = 243834 and test.id > 5340438) OR
(test.to = 244354 and test.id > 5337489) OR
(test.to = 244644 and test.id > 5338572) OR
(test.to = 244690 and test.id > 5338467)
)
)
or
(test.read <> 1 and test.direction <> 3 and test.direction <> 2 and test.to = 244975 and test.from not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)
)
or
(test.read <> 1 and test.direction = 2 and test.from = 244975 and test.to not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)
)
)
order by test.id;
如果我对此查询做了解释,它将遍历所有行:
1 SIMPLE test index PRIMARY,one,two,three,four PRIMARY 4 1440596 Using where
如果我同时删除&#34;而不是&#34;声明,然后它工作正常:
select test.id, test.from, test.to, test.message, test.sent, test.read, test.direction from test
where (
(test.to = 244975 and test.direction <> 2 and test.direction <> 3 and
(
(test.from = 204177 and test.id > 5341203) OR
(test.from = 214518 and test.id > 5336549) OR
(test.from = 231429 and test.id > 5338284) OR
(test.from = 242739 and test.id > 5339541) OR
(test.from = 243834 and test.id > 5340438) OR
(test.from = 244354 and test.id > 5337489) OR
(test.from = 244644 and test.id > 5338572) OR
(test.from = 244690 and test.id > 5338467)
)
)
or
(test.from = 244975 and test.direction <> 1 and test.direction <> 3 and
(
(test.to = 204177 and test.id > 5341203) OR
(test.to = 214518 and test.id > 5336549) OR
(test.to = 231429 and test.id > 5338284) OR
(test.to = 242739 and test.id > 5339541) OR
(test.to = 243834 and test.id > 5340438) OR
(test.to = 244354 and test.id > 5337489) OR
(test.to = 244644 and test.id > 5338572) OR
(test.to = 244690 and test.id > 5338467)
)
)
or
(test.read <> 1 and test.direction <> 3 and test.direction <> 2 and test.to = 244975
)
or
(test.read <> 1 and test.direction = 2 and test.from = 244975
)
)
order by test.id;
现在解释查询返回:
1 SIMPLE test index_merge PRIMARY,one,two,three,four one,two 5,5 30 Using sort_union(one,two); Using where; Using filesort
我不确定为什么它不能正常工作。我在索引中缺少什么?
答案 0 :(得分:5)
我不确定为什么它不能正常工作。我在索引中缺少什么?
我非常确定查询规划器工作正常,在这种情况下,你不会错过索引中的任何内容。查询计划程序决定使用不同的索引会更快,因为这两个查询非常不同。
我们可以让优化器为我们使用索引的并集,这将使它更快。您可以保留not in
而不更改任何or
语句。我运行了一些针对union方法使用的方法的基本基准。注意事项适用,因为您的数据库配置可能与我的有很大不同。运行查询1000次并执行3次我为每个查询花了最好的时间......
优化查询如下所示
real 0m15.410s
user 0m6.681s
sys 0m2.641s
重写为一组工会
real 0m17.747s
user 0m6.798s
sys 0m2.812s
像优化器一样思考并使用较少的数据
以下SQL在大约400万行数据库的测试中快了几个数量级。关键变化是以下一行
(select * from test where test.from_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) or test.to_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)) as test
这一行大大减少了mysql需要处理的数据集,因为我们使用的是in
而不是not in
。这是新查询,我试图不要过多地更改原始查询。
select SQL_NO_CACHE test.id, test.from_, test.to_, test.message, test.sent, test.read_, test.direction
from (select * from test where test.from_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) or test.to_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)) as test
where (
(test.to_ = 244975 and test.direction <> 2 and test.direction <> 3 and test.from_ in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) and
(
(test.from_ = 204177 and test.id > 5341203) OR
(test.from_ = 214518 and test.id > 5336549) OR
(test.from_ = 231429 and test.id > 5338284) OR
(test.from_ = 242739 and test.id > 5339541) OR
(test.from_ = 243834 and test.id > 5340438) OR
(test.from_ = 244354 and test.id > 5337489) OR
(test.from_ = 244644 and test.id > 5338572) OR
(test.from_ = 244690 and test.id > 5338467)
)
)
or
(test.from_ = 244975 and test.direction <> 1 and test.direction <> 3 and test.to_ in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) and
(
(test.to_ = 204177 and test.id > 5341203) OR
(test.to_ = 214518 and test.id > 5336549) OR
(test.to_ = 231429 and test.id > 5338284) OR
(test.to_ = 242739 and test.id > 5339541) OR
(test.to_ = 243834 and test.id > 5340438) OR
(test.to_ = 244354 and test.id > 5337489) OR
(test.to_ = 244644 and test.id > 5338572) OR
(test.to_ = 244690 and test.id > 5338467)
))
or
(test.read_ <> 1 and test.direction <> 2 and test.direction <> 3 and test.to_ = 244975 and test.from_ not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690))
or
(test.read_ <> 1 and test.direction = 2 and test.from_ = 244975 and test.to_ not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690))
)
order by test.id;
对此的解释计划看起来非常不同......
mysql> \. sql_fixed.sql
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 226
filtered: 100.00
Extra: Using where; Using filesort
*************************** 2. row ***************************
id: 2
select_type: DERIVED
table: test
type: index_merge
possible_keys: one,two
key: two,one
key_len: 4,4
ref: NULL
rows: 226
filtered: 100.00
Extra: Using sort_union(two,one); Using where
2 rows in set, 1 warning (0.01 sec)
智能优化器立即可以看到它不需要大部分数据,因为我们已经告诉它使用带有几个键的IN
语句。大多数查询优化器都会将高成本附加到磁盘访问中,因此优化器通常会优先考虑减少此操作的任何内容。
NOT IN vs IN
not in
和in
非常不同。在这种情况下,这些之间的区别是访问模式,我是暂时还是作为结果集的一部分需要数据。当您使用带有几个键的not in
并且索引包含数百万个键时,如果数据是结果集的一部分,则可能需要读取大量记录。即使使用索引not in
,也可以使用几个键从磁盘... in
读取数百万条记录,这些是您需要查找和使用小子集的密钥。这两种访问模式非常不同。以下示例可能有助于明确这一点......
1. I don't want these 10 items from a 1,000,000 records I need the other 999,990, this reads the whole index.
2. I only want these 10 from a 1,000,000 records. This might only require one disk seek.
数字2更快,因为访问模式即我发现了我需要的10个,Nunmber 1.可能需要读取一百万个记录。
MySQL的查询优化器正在看到这一点,即最后两个OR语句要求来自表或索引的大数据子集,即上面的情况1.看到这个以及无论如何它需要使用主键这一事实,优化器决定使用主键更快。
当您删除not in
更改内容时,即现在查询计划程序可以使用索引,因为在其他两个or
子句中它们生效get me the few from the many
并执行index_merge共享to
和from
列以及id
的两个键。
要查看我的意思,请不要删除查询中的“not in”部分,将其更改为in
以查看会发生什么,在我的计算机上查询计划已更改为使用范围索引。
答案 1 :(得分:4)
如果您的mySQL版本低于5.0.7,则mysql问题可能是原因
中查看此票证答案 2 :(得分:4)
根据我的经验,混合AND
和OR
通常会导致MySQL出现奇怪的查询计划。我没有足够的数据来测试,但我会尝试使用UNION ALL
重写您的查询。毕竟,OR
中的WHERE
基本上是UNION
。
这个想法是在较小的条件下分解它,以便MySQL可以使用针对每个部分优化的不同索引,而不是将所有索引一起干扰。
SELECT * FROM (
SELECT
test.id, test.from, test.to, test.message, test.sent, test.read, test.direction
FROM
test
WHERE
test.to = 244975
AND test.direction <> 2
AND test.direction <> 3
AND (
(test.from = 204177 AND test.id > 5341203) OR
(test.from = 214518 AND test.id > 5336549) OR
(test.from = 231429 AND test.id > 5338284) OR
(test.from = 242739 AND test.id > 5339541) OR
(test.from = 243834 AND test.id > 5340438) OR
(test.from = 244354 AND test.id > 5337489) OR
(test.from = 244644 AND test.id > 5338572) OR
(test.from = 244690 AND test.id > 5338467)
)
UNION ALL
SELECT
test.id, test.from, test.to, test.message, test.sent, test.read, test.direction
FROM
test
WHERE
test.from = 244975
AND test.direction <> 1
AND test.direction <> 3
AND (
(test.to = 204177 and test.id > 5341203) OR
(test.to = 214518 and test.id > 5336549) OR
(test.to = 231429 and test.id > 5338284) OR
(test.to = 242739 and test.id > 5339541) OR
(test.to = 243834 and test.id > 5340438) OR
(test.to = 244354 and test.id > 5337489) OR
(test.to = 244644 and test.id > 5338572) OR
(test.to = 244690 and test.id > 5338467)
)
UNION ALL
SELECT
test.id, test.from, test.to, test.message, test.sent, test.read, test.direction
FROM
test
WHERE
test.read <> 1
AND test.direction <> 3
AND test.direction <> 2
AND test.to = 244975
AND test.from NOT IN (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)
UNION ALL
SELECT
test.id, test.from, test.to, test.message, test.sent, test.read, test.direction
FROM
test
WHERE
test.read <> 1
AND test.direction = 2
AND test.from = 244975
AND test.to NOT IN (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)
) test ORDER BY test.id
答案 3 :(得分:4)
将样本数据转储到测试中会很好,但我仍然创建了一些自己的数据。接下来,我将四个外部OR条件中的每一个分成子查询,UNIONed它们,并将排序移动到最终结果集。
在使用复杂的WHERE子句时,我遇到了索引问题,对我来说,看起来你有一个聊天/消息传递应用程序,并试图在单个查询中向特定用户发送消息。就个人而言,我将这些分成单独的查询以简化代码/查询。
这是我的问题:
SELECT test.id, test.from, test.to, test.message, test.sent, test.read, test.direction
FROM (
SELECT *
FROM test
WHERE test.to = 244975
AND test.direction not in (2,3)
AND (
(test.from = 204177 AND test.id > 5341203)
OR (test.from = 214518 AND test.id > 5336549)
OR (test.from = 231429 AND test.id > 5338284)
OR (test.from = 242739 AND test.id > 5339541)
OR (test.from = 243834 AND test.id > 5340438)
OR (test.from = 244354 AND test.id > 5337489)
OR (test.from = 244644 AND test.id > 5338572)
OR (test.from = 244690 AND test.id > 5338467)
)
UNION
SELECT *
FROM test
WHERE test.from = 244975
AND test.direction not in (1,3)
AND (
(test.to = 204177 AND test.id > 5341203)
OR (test.to = 214518 AND test.id > 5336549)
OR (test.to = 231429 AND test.id > 5338284)
OR (test.to = 242739 AND test.id > 5339541)
OR (test.to = 243834 AND test.id > 5340438)
OR (test.to = 244354 AND test.id > 5337489)
OR (test.to = 244644 AND test.id > 5338572)
OR (test.to = 244690 AND test.id > 5338467)
)
UNION
SELECT *
FROM test
WHERE test.read != 1
AND test.direction not in (2,3)
AND test.to = 244975
AND test.from not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)
UNION
SELECT *
FROM test
WHERE test.read != 1
AND test.direction = 2
AND test.from = 244975
AND test.to not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)
) test
ORDER BY test.id;
答案 4 :(得分:4)
这可能是由于附加列的in
条件添加到where子句的额外级别的嵌套/复杂性。
您的第二个查询使用index merge union sort将where子句转换为OR
组合的range conditions。
使用in
比较的每个值都计为另一个范围谓词,因此在第一个查询中添加两个in
条件,每个条件增加64个谓词。
随着谓词数量的增加,优化程序在某些时候决定扫描整个表格会更快。
答案 5 :(得分:1)
从这个开始
select a.id, a.from, a.to, a.message, a.sent,
a.read, a.direction
from ( ( SELECT * FROM test WHERE test.to = 244975 ) UNION DISTINCT
( SELECT * FROM test WHERE test.from = 244975 ) ) a
where ... -- but change `test` to `a`
假设子查询的行数少于test
,这可能会更快。
现在,使用“懒惰评估”来加快速度:
select a.id, a.from, a.to, a.message, a.sent,
a.read, a.direction
from ( ( SELECT id FROM test WHERE test.to = 244975 ) UNION DISTINCT
( SELECT id FROM test WHERE test.from = 244975 ) ) b -- Note `b`
JOIN test AS a USING(id) -- added
where ... -- but change `test` to `a`
这个可以帮助,因为它不会拖延所有列。
最后一个版本只需要
PRIMARY KEY(id)
INDEX(from, id)
INDEX(to, id)