使用INNER JOIN时的正确索引/查询

时间:2012-03-03 21:12:20

标签: mysql indexing

我不确定如何制作一个能够正确捕获category / log_code的合适索引。也许我还需要更改我的查询?感谢任何意见!

所有SELECTS包含:

SELECT logentry_id, date, log_codes.log_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code 
ORDER BY logentry_id DESC

查询可以如上所述,但通常有一个WHERE来指定要显示的log_codes的类别,和/或合作伙伴和/或客户。 WHERE的例子:

WHERE partner_id = 1

WHERE log_codes.category_overview = 1

WHERE partner_id = 1 AND log_codes.category_overview = 1

WHERE partner_id = 1 AND customer_id = 1 AND log_codes.category_overview = 1

数据库结构:

CREATE TABLE IF NOT EXISTS `log_codes` (
  `log_code` smallint(6) NOT NULL,
  `log_desc` varchar(255),
  `category_mail` tinyint(1) NOT NULL,
  `category_overview` tinyint(1) NOT NULL,
  `category_cron` tinyint(1) NOT NULL,
  `category_documents` tinyint(1) NOT NULL,
  `category_error` tinyint(1) NOT NULL,
  PRIMARY KEY (`log_code`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

CREATE TABLE IF NOT EXISTS `log_entries` (
  `logentry_id` int(11) NOT NULL AUTO_INCREMENT,
  `date` datetime NOT NULL,
  `log_code` smallint(6) NOT NULL,
  `partner_id` int(11) NOT NULL,
  `customer_id` int(11) NOT NULL,
  PRIMARY KEY (`logentry_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ;

编辑:在字段上添加了索引,这里是SHOW INDEXES的输出:

+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table     | Non_unique | Key_name              | Seq_in_index | Column_name           | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| log_codes |          0 | PRIMARY               |            1 | log_code              | A         |          97 |     NULL | NULL   |      | BTREE      |         |               |
| log_codes |          1 | category_mail         |            1 | category_mail         | A         |           1 |     NULL | NULL   |      | BTREE      |         |               |
| log_codes |          1 | category_overview     |            1 | category_overview     | A         |           1 |     NULL | NULL   |      | BTREE      |         |               |
| log_codes |          1 | category_cron         |            1 | category_cron         | A         |           1 |     NULL | NULL   |      | BTREE      |         |               |
| log_codes |          1 | category_documents    |            1 | category_documents    | A         |           1 |     NULL | NULL   |      | BTREE      |         |               |
| log_codes |          1 | category_error        |            1 | category_error        | A         |           1 |     NULL | NULL   |      | BTREE      |         |               |
+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table       | Non_unique | Key_name     | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| log_entries |          0 | PRIMARY      |            1 | logentry_id  | A         |      163020 |     NULL | NULL   |      | BTREE      |         |               |
| log_entries |          1 | log_code     |            1 | log_code     | A         |          90 |     NULL | NULL   |      | BTREE      |         |               |
| log_entries |          1 | partner_id   |            1 | partner_id   | A         |           6 |     NULL | NULL   | YES  | BTREE      |         |               |
| log_entries |          1 | customer_id  |            1 | customer_id  | A         |       20377 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

EDIT 2 :在log_codes上添加了复合索引:(log_code,category_overview)和(log_code,category_overview)。 log_entries上的(customer_id,partner_id)。

以下是一些EXPLAIN输出(查询返回66818行):

EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1 ORDER BY logentry_id DESC
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| id | select_type | table       | type   | possible_keys                       | key        | key_len | ref                  | rows   | Extra                       |
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
|  1 | SIMPLE      | log_entries | ref    | log_code,partner_id                 | partner_id | 2       | const                | 156110 | Using where; Using filesort |
|  1 | SIMPLE      | log_codes   | eq_ref | PRIMARY,code_overview,overview_code | PRIMARY    | 2       | log_entries.log_code |      1 | Using where                 |
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+

但是我也有一些我认为不会影响索引设计的LEFT JOIN,但它们会导致“使用临时”问题。这是EXPLAIN输出(查询返回66818行):

EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries 
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code 
LEFT JOIN partners ON log_entries.partner_id = partners.partner_id 
LEFT JOIN joined_table1 ON log_entries.t1_id = joined_table1.t1_id 
LEFT JOIN joined_table2 ON log_entries.t2_id = joined_table2.t2_id 
LEFT JOIN joined_table3 ON log_entries.t3_id = joined_table3.t3_id 
LEFT JOIN joined_table4 ON joined_table3.t4_id = joined_table4.t4_id 
LEFT JOIN joined_table5 ON log_entries.t5_id = joined_table5.t5_id 
LEFT JOIN joined_table6 ON log_entries.t6_id = joined_table6.t6_id
WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1 ORDER BY logentry_id DESC;
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+
| id | select_type | table         | type   | possible_keys                       | key           | key_len | ref                      | rows | Extra                                        |
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | log_codes     | ref    | PRIMARY,code_overview,overview_code | overview_code | 1       | const                    |   54 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | log_entries   | ref    | log_code,partner_id                 | log_code      | 2       | log_codes.log_code       | 1811 | Using where                                  |
|  1 | SIMPLE      | partners      | const  | PRIMARY                             | PRIMARY       | 2       | const                    |    1 | Using index                                  |
|  1 | SIMPLE      | joined_table1 | eq_ref | PRIMARY                             | PRIMARY       | 1       | log_entries.t1_id        |    1 | Using index                                  |
|  1 | SIMPLE      | joined_table2 | eq_ref | PRIMARY                             | PRIMARY       | 1       | log_entries.t2_id        |    1 | Using index                                  |
|  1 | SIMPLE      | joined_table3 | eq_ref | PRIMARY                             | PRIMARY       | 3       | log_entries.t3_id        |    1 |                                              |
|  1 | SIMPLE      | joined_table4 | eq_ref | PRIMARY                             | PRIMARY       | 3       | joined_table3.t4_id      |    1 | Using index                                  |
|  1 | SIMPLE      | joined_table5 | eq_ref | PRIMARY                             | PRIMARY       | 4       | log_entries.t5_id        |    1 | Using index                                  |
|  1 | SIMPLE      | joined_table6 | eq_ref | PRIMARY                             | PRIMARY       | 4       | log_entries.t6_id        |    1 | Using index                                  |
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+

不知道这是好事还是坏事,但子查询似乎摆脱了“使用临时”。这是两个常见场景的EXPLAIN输出。此查询返回66818行:

EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1
AND log_entries.log_code IN (SELECT log_code FROM log_codes WHERE category_overview = 1) ORDER BY logentry_id DESC;
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| id | select_type        | table       | type            | possible_keys                       | key        | key_len | ref                  | rows   | Extra                       |
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
|  1 | PRIMARY            | log_entries | ref             | log_code,partner_id                 | partner_id | 2       | const                | 156110 | Using where; Using filesort |
|  1 | PRIMARY            | log_codes   | eq_ref          | PRIMARY,code_overview               | PRIMARY    | 2       | log_entries.log_code |      1 |                             |
|  2 | DEPENDENT SUBQUERY | log_codes   | unique_subquery | PRIMARY,code_overview,overview_code | PRIMARY    | 2       | func                 |      1 | Using where                 |
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+

关于客户的概述,查询返回12行:

EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1 AND log_entries.customer_id = 10000
AND log_entries.log_code IN (SELECT log_code FROM log_codes WHERE category_overview = 1) ORDER BY logentry_id DESC;
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
| id | select_type        | table       | type            | possible_keys                                    | key          | key_len | ref                  | rows | Extra                       |
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
|  1 | PRIMARY            | log_entries | ref             | log_code,partner_id,customer_id,customer_partner | customer_id  | 4       | const                |   27 | Using where; Using filesort |
|  1 | PRIMARY            | log_codes   | eq_ref          | PRIMARY,code_overview                            | PRIMARY      | 2       | log_entries.log_code |    1 |                             |
|  2 | DEPENDENT SUBQUERY | log_codes   | unique_subquery | PRIMARY,code_overview,overview_code              | PRIMARY      | 2       | func                 |    1 | Using where                 |
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+

2 个答案:

答案 0 :(得分:2)

在索引方面,没有一个简单的规则可以保证成功 - 您需要查看一段合理的典型调用时间,以找出有助于提高性能的方法。

因此,所有后续评论都不是绝对规则:

如果索引很快将您带到一小部分数据而不是仅消除一半数据(例如,性别列中的索引中很少有值,只有M / F作为可能的条目)。因此,例如,多么独特的价值观log_code,category_overview和partner_id?

对于给定的查询,拥有“覆盖”索引通常很有帮助,该索引包含查询使用的所有字段 - 但是,如果查询中的单个表中的字段太多,则而是想要一个包含“where”或“join”子句中的字段的索引来标识该行,然后连接回表存储以获取所需的所有字段。

因此,根据您提供的信息,log_codes上的候选索引将包括log_code和category_overview。类似地,log_entries为log_code和partner_id。但是,需要对它们如何影响性能进行评估。

请记住,任何给定的索引都可以提高单个查询检索数据的读取性能,但它也会减慢对表的写入速度,因为那时需要写入更多信息,即新行适合其他指数。这就是为什么你需要查看数据库上活动的大图来确定索引的价值所在。

答案 1 :(得分:1)

为了花时间用所要求的详细信息更新你的问题,做得很好。我很抱歉,如果这听起来很光顾,但是那些不愿意花时间去帮助自己的人数是惊人的。

在log_entries表上的(customer_id,partner_id)中添加复合索引应该为您的示例where子句带来显着的好处。

log_codes表的SHOW INDEXES输出表明它当前没有填充,因为除了PK之外,它显示的是NULL。是这种情况吗?

编辑抱歉。请阅读您对KAJ回答详细说明表内容的评论。可能值得再次运行SHOW INDEXES语句,因为看起来MySQL可能正在构建其统计数据。

为log_codes表添加复合索引(log_code,category_overview)应该会有所帮助,但是您需要检查解释输出以查看它是否正在使用。

作为一个非常粗略的一般规则,您希望创建从具有最高基数的列开始的复合索引,但情况并非总是如此。它将在很大程度上取决于数据分布和查询结构。

更新我创建了数据集的模型并添加了以下索引。它们根据您的示例WHERE子句提供了显着的改进 -

ALTER TABLE `log_codes`
    ADD INDEX `IX_overview_code` (`category_overview`, `log_code`);

ALTER TABLE `log_entries`
    ADD INDEX `IX_partner_code` (`partner_id`, `log_code`),
    ADD INDEX `IX_customer_partner_code` (`customer_id`, `partner_id`, `log_code`);

最后一个索引在磁盘空间和插入性能下降方面相当昂贵,但根据最终的WHERE子句示例提供了非常快的SELECT。我的样本数据集在log_entries表中只有超过1M条记录,在合作伙伴和客户ID上分布均匀。你的三个样本WHERE子句在不到一秒的时间内执行,但是category_overview作为唯一标准的那个条件非常缓慢,尽管仍然只有20万行的亚秒级。