Question

我有一个（大）表，在那里我使用WHERE中的3个字段进行查询。其中一个字段有索引（日期），我正在寻找过去3个月的点击量。虽然它永远不会是一个快速查询，但我希望最少的是使用此日期的索引。

这是我的疑问：

SELECT id
FROM statsTable
WHERE 1
   AND ip            =  'ipgoeshere'
   AND anotherstring =  'Quite a long string goes here, something like this or even longer'
   AND `date`        >  DATE_ADD( NOW( ) , INTERVAL -3 MONTH )

这是解释：

id  select_type table       type    possible_keys   key     key_len ref     rows    Extra
1   SIMPLE      statsTable  ALL     date            NULL    NULL    NULL    4833721 Using where; Using filesort

这是一个完整的表扫描，因为INNODB行计数我的行数已关闭，但这就是所有的em。 大约需要30秒。

如果我像这样强制索引，我会得到预期的结果：

SELECT id
FROM statsTable FORCE INDEX (date)
WHERE 1
   AND ip            =  'ipgoeshere'
   AND anotherstring =  'Quite a long string goes here, something like this or even longer'
   AND `date`        >  DATE_ADD( NOW( ) , INTERVAL -3 MONTH )

再次，解释：

id  select_type table       type    possible_keys   key     key_len ref     rows    Extra
1   SIMPLE      statsTable  range   date            date    8       NULL    1120172 Using where

现在我们'只有'一百万个结果，但这可以快速“点亮”（如3秒而不是30秒）。

表格：

CREATE TABLE IF NOT EXISTS `statsTable` (

  `id`            int(11) unsigned NOT NULL AUTO_INCREMENT,
  `date`          datetime NOT NULL,
  `ip`            varchar(15) NOT NULL,
  `anotherstring` varchar(255) NOT NULL,

  PRIMARY KEY (`id`),
  KEY `date` (`date`)

) ENGINE=InnoDB;

奇怪的是：我在另一个数据库上运行此表（在不同的服务器上运行），并且在该实例中使用索引IS。我看不出这里有什么问题。有没有我错过的设置？或者它可能是其他一些小的差异？除了差异，我不明白为什么上面的查询不会使用密钥。

我已经运行OPTIMIZE TABLE，并且正如@DhruvPathak建议ANALYZE TABLE，但解释仍然保持不变。我还尝试了朋友建议的ALTER TABLE来重建索引。没有运气。

Answer 1

运行ANALYZE TABLE一次，看看这是否有助于纠正优化器的选择。

http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html

这也有助于： MySQL not using indexes with WHERE IN clause?

您可以尝试编辑查询吗？

为什么在查询中有一个reduntant TRUE条件WHERE 1？

更改

SELECT id
FROM statsTable
WHERE 1
   AND ip            =  'ipgoeshere'
   AND anotherstring =  'Quite a long string goes here, something like this or even longer'
   AND `date`        >  DATE_ADD( NOW( ) , INTERVAL -3 MONTH )

要

SELECT id
FROM statsTable
where  `date`        >  DATE_ADD( NOW( ) , INTERVAL -3 MONTH ) 
AND ip            =  'ipgoeshere'
AND anotherstring =  'Quite a long string goes here, something like this or even longer'

Answer 2

根据您的查询格式，理想索引应该在

上

ip, date

或

ip, date, anotherstring <-- this could be overkill

和

order by null <-- eliminate the file sort

最后，它可能是你的另一个数据库包含更少的记录

Answer 3

未使用索引，因为执行计划程序决定最好全面扫描表而不是使用索引。当索引对查询没有足够的选择性时会发生这种情况。

如果范围检查中的日期超过整个表的10-20％，那么计划程序决定扫描（顺序）整个表将比使用索引并检索落在该范围内的行更快（这种检索不会是顺序的，因为行将遍布整个表格。）

这就是为什么你会看到不同数据集的不同行为。

为使您的查询最佳，您可以在以下位置创建索引：

(ip, yourDateField)

或

(anotherstring, yourDateField)

或

(ip, anotherstring, yourDateField)

我认为第一个选项足够有选择性。无需在索引中添加长VARCHAR(255)字段。或者，使用似乎在您的情况下正常工作的FORCE INDEX。

索引没有在可以和应该的地方使用

3 个答案: