当"不在"时,mysql不使用索引声明存在

时间:2016-04-01 09:55:49

标签: mysql

表结构是:

CREATE TABLE `test` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `from` int(10) unsigned NOT NULL,
  `to` int(10) unsigned NOT NULL,
  `message` text NOT NULL,
  `sent` int(10) unsigned NOT NULL DEFAULT '0',
  `read` tinyint(1) unsigned NOT NULL DEFAULT '0',
  `direction` tinyint(1) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `one` (`to`,`direction`,`from`,`id`),
  KEY `two` (`from`,`direction`,`to`,`id`),
  KEY `three` (`read`,`direction`,`to`),
  KEY `four` (`read`,`direction`,`from`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

我有一个奇怪的问题。请查看以下查询:

select test.id, test.from, test.to, test.message, test.sent, test.read, test.direction from test 
where (

    (test.to = 244975 and test.direction <> 2 and test.direction <> 3 and 
        (
        (test.from = 204177 and test.id > 5341203) OR 
        (test.from = 214518 and test.id > 5336549) OR
        (test.from = 231429 and test.id > 5338284) OR
        (test.from = 242739 and test.id > 5339541) OR
        (test.from = 243834 and test.id > 5340438) OR
        (test.from = 244354 and test.id > 5337489) OR
        (test.from = 244644 and test.id > 5338572) OR
        (test.from = 244690 and test.id > 5338467) 
        )

    )

    or 

    (test.from = 244975 and test.direction <> 1 and test.direction <> 3 and 
        (
        (test.to = 204177 and test.id > 5341203) OR
        (test.to = 214518 and test.id > 5336549) OR
        (test.to = 231429 and test.id > 5338284) OR
        (test.to = 242739 and test.id > 5339541) OR
        (test.to = 243834 and test.id > 5340438) OR
        (test.to = 244354 and test.id > 5337489) OR
        (test.to = 244644 and test.id > 5338572) OR
        (test.to = 244690 and test.id > 5338467)
        )
    )

    or 

    (test.read <> 1 and test.direction <> 3 and test.direction <> 2 and test.to = 244975  and test.from not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)

    )

    or

    (test.read <> 1 and test.direction = 2 and test.from = 244975 and test.to not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)

    )


     )



     order by test.id;

如果我对此查询做了解释,它将遍历所有行:

1   SIMPLE  test    index   PRIMARY,one,two,three,four  PRIMARY 4       1440596 Using where

如果我同时删除&#34;而不是&#34;声明,然后它工作正常:

select test.id, test.from, test.to, test.message, test.sent, test.read, test.direction from test 
where (

    (test.to = 244975 and test.direction <> 2 and test.direction <> 3 and 
        (
        (test.from = 204177 and test.id > 5341203) OR 
        (test.from = 214518 and test.id > 5336549) OR
        (test.from = 231429 and test.id > 5338284) OR
        (test.from = 242739 and test.id > 5339541) OR
        (test.from = 243834 and test.id > 5340438) OR
        (test.from = 244354 and test.id > 5337489) OR
        (test.from = 244644 and test.id > 5338572) OR
        (test.from = 244690 and test.id > 5338467) 
        )

    )

    or 

    (test.from = 244975 and test.direction <> 1 and test.direction <> 3 and 
        (
        (test.to = 204177 and test.id > 5341203) OR
        (test.to = 214518 and test.id > 5336549) OR
        (test.to = 231429 and test.id > 5338284) OR
        (test.to = 242739 and test.id > 5339541) OR
        (test.to = 243834 and test.id > 5340438) OR
        (test.to = 244354 and test.id > 5337489) OR
        (test.to = 244644 and test.id > 5338572) OR
        (test.to = 244690 and test.id > 5338467)
        )
    )

    or 

    (test.read <> 1 and test.direction <> 3 and test.direction <> 2 and test.to = 244975 

    )

    or

    (test.read <> 1 and test.direction = 2 and test.from = 244975 

    )


     )



     order by test.id;

现在解释查询返回:

1   SIMPLE  test    index_merge PRIMARY,one,two,three,four  one,two 5,5     30  Using sort_union(one,two); Using where; Using filesort

我不确定为什么它不能正常工作。我在索引中缺少什么?

6 个答案:

答案 0 :(得分:5)

  

我不确定为什么它不能正常工作。我在索引中缺少什么?

我非常确定查询规划器工作正常,在这种情况下,你不会错过索引中的任何内容。查询计划程序决定使用不同的索引会更快,因为这两个查询非常不同。

我们可以让优化器为我们使用索引的并集,这将使它更快。您可以保留not in而不更改任何or语句。我运行了一些针对union方法使用的方法的基本基准。注意事项适用,因为您的数据库配置可能与我的有很大不同。运行查询1000次并执行3次我为每个查询花了最好的时间......

优化查询如下所示

real    0m15.410s
user    0m6.681s
sys 0m2.641s

重写为一组工会

real    0m17.747s
user    0m6.798s
sys 0m2.812s

像优化器一样思考并使用较少的数据

以下SQL在大约400万行数据库的测试中快了几个数量级。关键变化是以下一行

(select * from test where test.from_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) or test.to_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)) as test 

这一行大大减少了mysql需要处理的数据集,因为我们使用的是in而不是not in。这是新查询,我试图不要过多地更改原始查询。

select SQL_NO_CACHE test.id, test.from_, test.to_, test.message, test.sent, test.read_, test.direction 
from (select * from test where test.from_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) or test.to_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)) as test 
where (
  (test.to_ = 244975 and test.direction <> 2 and test.direction <> 3 and test.from_ in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) and 
        (   
        (test.from_ = 204177 and test.id > 5341203) OR  
        (test.from_ = 214518 and test.id > 5336549) OR
        (test.from_ = 231429 and test.id > 5338284) OR
        (test.from_ = 242739 and test.id > 5339541) OR
        (test.from_ = 243834 and test.id > 5340438) OR
        (test.from_ = 244354 and test.id > 5337489) OR
        (test.from_ = 244644 and test.id > 5338572) OR
        (test.from_ = 244690 and test.id > 5338467) 
        )   
    )   
    or  
    (test.from_ = 244975 and test.direction <> 1 and test.direction <> 3 and test.to_ in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) and 
        (   
        (test.to_ = 204177 and test.id > 5341203) OR
        (test.to_ = 214518 and test.id > 5336549) OR
        (test.to_ = 231429 and test.id > 5338284) OR
        (test.to_ = 242739 and test.id > 5339541) OR
        (test.to_ = 243834 and test.id > 5340438) OR
        (test.to_ = 244354 and test.id > 5337489) OR
        (test.to_ = 244644 and test.id > 5338572) OR
        (test.to_ = 244690 and test.id > 5338467)
        ))  
    or  
    (test.read_ <> 1 and test.direction <> 2 and test.direction <> 3 and test.to_ = 244975  and test.from_ not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690))
    or  
    (test.read_ <> 1 and test.direction = 2 and test.from_ = 244975 and test.to_ not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690))
     )   
     order by test.id;

对此的解释计划看起来非常不同......

mysql> \. sql_fixed.sql
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 226
     filtered: 100.00
        Extra: Using where; Using filesort
*************************** 2. row ***************************
           id: 2
  select_type: DERIVED
        table: test
         type: index_merge
possible_keys: one,two
          key: two,one
      key_len: 4,4
          ref: NULL
         rows: 226
     filtered: 100.00
        Extra: Using sort_union(two,one); Using where
2 rows in set, 1 warning (0.01 sec)

智能优化器立即可以看到它不需要大部分数据,因为我们已经告诉它使用带有几个键的IN语句。大多数查询优化器都会将高成本附加到磁盘访问中,因此优化器通常会优先考虑减少此操作的任何内容。

NOT IN vs IN

not inin非常不同。在这种情况下,这些之间的区别是访问模式,我是暂时还是作为结果集的一部分需要数据。当您使用带有几个键的not in并且索引包含数百万个键时,如果数据是结果集的一部分,则可能需要读取大量记录。即使使用索引not in,也可以使用几个键从磁盘... in读取数百万条记录,这些是您需要查找和使用小子集的密钥。这两种访问模式非常不同。以下示例可能有助于明确这一点......

1. I don't want these 10 items from a 1,000,000 records I need the other 999,990, this reads the whole index.
2. I only want these 10 from a 1,000,000 records. This might only require one disk seek.

数字2更快,因为访问模式即我发现了我需要的10个,Nunmber 1.可能需要读取一百万个记录。

MySQL的查询优化器正在看到这一点,即最后两个OR语句要求来自表或索引的大数据子集,即上面的情况1.看到这个以及无论如何它需要使用主键这一事实,优化器决定使用主键更快。

当您删除not in更改内容时,即现在查询计划程序可以使用索引,因为在其他两个or子句中它们生效get me the few from the many并执行index_merge共享tofrom列以及id的两个键。

要查看我的意思,请不要删除查询中的“not in”部分,将其更改为in以查看会发生什么,在我的计算机上查询计划已更改为使用范围索引。

答案 1 :(得分:4)

如果您的mySQL版本低于5.0.7,则mysql问题可能是原因

在MySQL跟踪https://bugs.mysql.com/bug.php?id=10561

中查看此票证

答案 2 :(得分:4)

根据我的经验,混合ANDOR通常会导致MySQL出现奇怪的查询计划。我没有足够的数据来测试,但我会尝试使用UNION ALL重写您的查询。毕竟,OR中的WHERE基本上是UNION

这个想法是在较小的条件下分解它,以便MySQL可以使用针对每个部分优化的不同索引,而不是将所有索引一起干扰。

SELECT * FROM (
SELECT
    test.id, test.from, test.to, test.message, test.sent, test.read, test.direction
FROM
    test 
WHERE
    test.to = 244975
    AND test.direction <> 2
    AND test.direction <> 3
    AND (
        (test.from = 204177 AND test.id > 5341203) OR 
        (test.from = 214518 AND test.id > 5336549) OR
        (test.from = 231429 AND test.id > 5338284) OR
        (test.from = 242739 AND test.id > 5339541) OR
        (test.from = 243834 AND test.id > 5340438) OR
        (test.from = 244354 AND test.id > 5337489) OR
        (test.from = 244644 AND test.id > 5338572) OR
        (test.from = 244690 AND test.id > 5338467) 
    )
UNION ALL
SELECT
    test.id, test.from, test.to, test.message, test.sent, test.read, test.direction
FROM
    test 
WHERE
    test.from = 244975
    AND test.direction <> 1
    AND test.direction <> 3
    AND (
        (test.to = 204177 and test.id > 5341203) OR
        (test.to = 214518 and test.id > 5336549) OR
        (test.to = 231429 and test.id > 5338284) OR
        (test.to = 242739 and test.id > 5339541) OR
        (test.to = 243834 and test.id > 5340438) OR
        (test.to = 244354 and test.id > 5337489) OR
        (test.to = 244644 and test.id > 5338572) OR
        (test.to = 244690 and test.id > 5338467)
    )
UNION ALL
SELECT
    test.id, test.from, test.to, test.message, test.sent, test.read, test.direction
FROM
    test 
WHERE
    test.read <> 1
    AND test.direction <> 3
    AND test.direction <> 2
    AND test.to = 244975
    AND test.from NOT IN (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)
UNION ALL
SELECT
    test.id, test.from, test.to, test.message, test.sent, test.read, test.direction
FROM
    test 
WHERE
    test.read <> 1
    AND test.direction = 2
    AND test.from = 244975
    AND test.to NOT IN (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)
) test ORDER BY test.id

答案 3 :(得分:4)

将样本数据转储到测试中会很好,但我仍然创建了一些自己的数据。接下来,我将四个外部OR条件中的每一个分成子查询,UNIONed它们,并将排序移动到最终结果集。

在使用复杂的WHERE子句时,我遇到了索引问题,对我来说,看起来你有一个聊天/消息传递应用程序,并试图在单个查询中向特定用户发送消息。就个人而言,我将这些分成单独的查询以简化代码/查询。

这是我的问题:

SELECT test.id, test.from, test.to, test.message, test.sent, test.read, test.direction
FROM (
  SELECT *
  FROM test
  WHERE test.to = 244975
    AND test.direction not in (2,3)
    AND (
      (test.from = 204177 AND test.id > 5341203)
      OR (test.from = 214518 AND test.id > 5336549)
      OR (test.from = 231429 AND test.id > 5338284)
      OR (test.from = 242739 AND test.id > 5339541)
      OR (test.from = 243834 AND test.id > 5340438)
      OR (test.from = 244354 AND test.id > 5337489)
      OR (test.from = 244644 AND test.id > 5338572)
      OR (test.from = 244690 AND test.id > 5338467)
    )
  UNION
  SELECT *
  FROM test
  WHERE test.from = 244975
    AND test.direction not in (1,3)
    AND (
      (test.to = 204177 AND test.id > 5341203)
      OR (test.to = 214518 AND test.id > 5336549)
      OR (test.to = 231429 AND test.id > 5338284)
      OR (test.to = 242739 AND test.id > 5339541)
      OR (test.to = 243834 AND test.id > 5340438)
      OR (test.to = 244354 AND test.id > 5337489)
      OR (test.to = 244644 AND test.id > 5338572)
      OR (test.to = 244690 AND test.id > 5338467)
    )
  UNION
  SELECT *
  FROM test
  WHERE test.read != 1
    AND test.direction not in (2,3)
    AND test.to = 244975
    AND test.from not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)
  UNION
  SELECT *
  FROM test
  WHERE test.read != 1
    AND test.direction = 2
    AND test.from = 244975
    AND test.to not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)
) test
ORDER BY test.id;

答案 4 :(得分:4)

这可能是由于附加列的in条件添加到where子句的额外级别的嵌套/复杂性。

您的第二个查询使用index merge union sort将where子句转换为OR组合的range conditions

使用in比较的每个值都计为另一个范围谓词,因此在第一个查询中添加两个in条件,每个条件增加64个谓词。

随着谓词数量的增加,优化程序在某些时候决定扫描整个表格会更快。

答案 5 :(得分:1)

从这个开始

select  a.id, a.from, a.to, a.message, a.sent,
        a.read, a.direction
    from  ( ( SELECT * FROM test WHERE test.to   = 244975 ) UNION DISTINCT
            ( SELECT * FROM test WHERE test.from = 244975 ) ) a
    where ...  -- but change `test` to `a`

假设子查询的行数少于test,这可能会更快。

现在,使用“懒惰评估”来加快速度:

select  a.id, a.from, a.to, a.message, a.sent,
        a.read, a.direction
    from  ( ( SELECT id FROM test WHERE test.to   = 244975 ) UNION DISTINCT
            ( SELECT id FROM test WHERE test.from = 244975 ) ) b  -- Note `b`
    JOIN test AS a  USING(id)   -- added
    where ...  -- but change `test` to `a`

这个可以帮助,因为它不会拖延所有列。

最后一个版本只需要

PRIMARY KEY(id)
INDEX(from, id)
INDEX(to,   id)