选择非索引列会增加“发送数据”25倍 - 为什么以及如何改进?

时间:2011-06-19 15:53:13

标签: mysql sql indexing

本地 MySQL实例5.1上使用查询缓存关闭此表:

show create table product_views\G
*************************** 1. row ***************************
       Table: product_views
Create Table: CREATE TABLE `product_views` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `dateCreated` datetime NOT NULL,
  `dateModified` datetime DEFAULT NULL,
  `hibernateVersion` bigint(20) DEFAULT NULL,
  `brandName` varchar(255) DEFAULT NULL,
  `mfrModel` varchar(255) DEFAULT NULL,
  `origin` varchar(255) NOT NULL,
  `price` float DEFAULT NULL,
  `productType` varchar(255) DEFAULT NULL,
  `rebateDetailsViewed` tinyint(1) NOT NULL,
  `rebateSearchZipCode` int(11) DEFAULT NULL,
  `rebatesFoundAmount` float DEFAULT NULL,
  `rebatesFoundCount` int(11) DEFAULT NULL,
  `siteSKU` varchar(255) DEFAULT NULL,
  `timestamp` datetime NOT NULL,
  `uiContext` varchar(255) DEFAULT NULL,
  `siteVisitId` bigint(20) NOT NULL,
  `efficiencyLevel` varchar(255) DEFAULT NULL,
  `siteName` varchar(255) DEFAULT NULL,
  `clicks` varchar(1024) DEFAULT NULL,
  `rebateFormDownloaded` tinyint(1) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `siteVisitId` (`siteVisitId`,`siteSKU`),
  KEY `FK52C29B1E3CAB9CC4` (`siteVisitId`),
  KEY `rebateSearchZipCode_idx` (`rebateSearchZipCode`),
  KEY `FIND_UNPROCESSED_IDX` (`siteSKU`,`siteVisitId`,`timestamp`),
  CONSTRAINT `FK52C29B1E3CAB9CC4` FOREIGN KEY (`siteVisitId`) REFERENCES `site_visits` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=32909504 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

此查询需要~3s:

    SELECT pv.id, pv.siteSKU
      FROM product_views pv 
CROSS JOIN site_visits sv 
     WHERE pv.siteVisitId = sv.id 
       AND pv.siteSKU = 'foo' 
       AND sv.siteId = 'bar' 
       AND sv.postProcessed = 1 
       AND pv.timestamp >= '2011-05-19 00:00:00' 
       AND pv.timestamp < '2011-06-18 00:00:00';

但是这个(非索引列添加到SELECT)需要大约65秒:

    SELECT pv.id, pv.siteSKU, pv.hibernateVersion 
      FROM product_views pv 
CROSS JOIN site_visits sv 
     WHERE pv.siteVisitId = sv.id 
       AND pv.siteSKU = 'foo' 
       AND sv.siteId = 'bar' 
       AND sv.postProcessed = 1 
       AND pv.timestamp >= '2011-05-19 00:00:00' 
       AND pv.timestamp < '2011-06-18 00:00:00';

'where'或'from'子句中的任何内容都不同。所有额外的时间花费在'发送数据'上:

mysql> show profile for query 1;
+--------------------+-----------+
| Status             | Duration  |
+--------------------+-----------+
| starting           |  0.000155 |
| Opening tables     |  0.000029 |
| System lock        |  0.000007 |
| Table lock         |  0.000019 |
| init               |  0.000072 |
| optimizing         |  0.000032 |
| statistics         |  0.000316 |
| preparing          |  0.000034 |
| executing          |  0.000002 |
| Sending data       | 63.530402 |
| end                |  0.000044 |
| query end          |  0.000005 |
| freeing items      |  0.000091 |
| logging slow query |  0.000002 |
| logging slow query |  0.000109 |
| cleaning up        |  0.000004 |
+--------------------+-----------+
16 rows in set (0.00 sec)

我知道在where子句中使用非索引列会减慢速度,但为什么会在这里呢?如果我真的想从product_views中选择SELECT(*),可以采取哪些措施来改善后一种情况?

EXPLAIN输出

explain extended select pv.id, pv.siteSKU from product_views pv cross join site_visits sv where pv.siteVisitId=sv.id and pv.siteSKU='foo' and sv.sit eId='bar' and sv.postProcessed=1 and pv.timestamp>='2011-05-19 00:00:00' and pv.timestamp<'2011-06-18 00:00:00';
+----+-------------+-------+--------+-----------------------------------------------------+----------------------+---------+----------------------+-------+-----
-----+--------------------------+ | id | select_type | table | type   | possible_keys                          | key                  | key_len | ref | rows  | filt ered | Extra            |
+----+-------------+-------+--------+-----------------------------------------------------+----------------------+---------+----------------------+-------+-----
-----+--------------------------+ |  1 | SIMPLE      | pv    | ref    | siteVisitId,FK52C29B1E3CAB9CC4,FIND_UNPROCESSED_IDX | FIND_UNPROCESSED_IDX | 258     | const                | 41810 |   10
0.00 | Using where; Using index | |  1 | SIMPLE      | sv    | eq_ref | PRIMARY,post_processed_idx             | PRIMARY              | 8       | clabs.pv.siteVisitId |     1 |   10
0.00 | Using where              |
+----+-------------+-------+--------+-----------------------------------------------------+----------------------+---------+----------------------+-------+-----
-----+--------------------------+ 2 rows in set, 1 warning (0.00 sec)

mysql> explain extended select pv.id, pv.siteSKU, pv.hibernateVersion from product_views pv cross join site_visits sv where pv.siteVisitId=sv.id and pv.siteSKU= 'foo' and sv.siteId='bar' and sv.postProcessed=1 and pv.timestamp>='2011-05-19 00:00:00' and pv.timestamp<'2011-06-18 00:00:00';
+----+-------------+-------+--------+-----------------------------------------------------+----------------------+---------+----------------------+-------+-----
-----+-------------+ | id | select_type | table | type   | possible_keys                          | key                  | key_len | ref | rows  | filt ered | Extra       |
+----+-------------+-------+--------+-----------------------------------------------------+----------------------+---------+----------------------+-------+-----
-----+-------------+ |  1 | SIMPLE      | pv    | ref    | siteVisitId,FK52C29B1E3CAB9CC4,FIND_UNPROCESSED_IDX | FIND_UNPROCESSED_IDX | 258     | const                | 41810 |   10
0.00 | Using where | |  1 | SIMPLE      | sv    | eq_ref | PRIMARY,post_processed_idx             | PRIMARY              | 8       | clabs.pv.siteVisitId |     1 |   10
0.00 | Using where |
+----+-------------+-------+--------+-----------------------------------------------------+----------------------+---------+----------------------+-------+-----
-----+-------------+ 2 rows in set, 1 warning (0.00 sec)

UPDATE1:拆分为2个查询会将总时间降至~30​​秒范围

不确定原因,但将后一个查询拆分为以下内容会减少lat。从65岁到30岁:

1)SELECT pv.id .... // from,where where where with above

2)SELECT * FROM product_views其中id为(idList); // IDLIST

UPDATE2:TABLE SIZE

  • 表的大小为10M行
  • 查询返回大约3k行

3 个答案:

答案 0 :(得分:4)

当您只选择索引列时,MySQL只读取索引,而不需要读取表数据。据我记忆,这被称为索引覆盖的查询。但是,当存在已使用的索引中不存在的列时,MySQL需要打开表并从中读取数据。这就是索引覆盖的查询要快得多的原因。

至于改进,表中有多少行,查询返回多少,缓冲池大小是多少,可用RAM多少等等?

答案 1 :(得分:1)

从我所读到的关于show profile的内容来看,'发送数据'是执行过程的一部分,几乎与将实际数据发送到客户端无关。你可以看看this thread
另外,mysql docs说“发送数据”:

  

线程正在读取和处理SELECT语句的行,并将数据发送到客户端。由于在此状态期间发生的操作往往会执行大量磁盘访问(读取),因此它通常是给定查询生命周期中运行时间最长的状态。

在我看来,mysql最好不要在一个状态下混合“读取和处理SELECT语句的行”和“发送数据”,尤其是在“发送”数据的状态中“这会导致很多混乱。” >

答案 2 :(得分:0)

我根本不了解MySQL内部,但Darhazer的解释看起来像是我的赢家。添加非索引字段时,必须检索整个行。你的行非常宽。我无法从名称中看出它是如何(如果有的话)它是非规范化的,但我怀疑它是。 site namesite sku的气味就像它们属于带有FK的site查找表一样。 rebates found amountrebates found count听起来像应该从加入到单独的product rebate表的统计信息。等