我有2个MySQL(Ver 14.14 Distrib 5.5.49)表,看起来像这样:
CREATE TABLE `Document` (
`Id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`CompanyCode` int(10) unsigned NOT NULL,
`B` int(10) unsigned NOT NULL,
`C` int(10) unsigned NOT NULL,
`DocumentCode` int(10) unsigned NOT NULL,
`E` int(11) DEFAULT '0',
`EpochSeconds` int(11) DEFAULT '0',
`G` int(10) unsigned NOT NULL,
`H` int(10) unsigned NOT NULL,
`I` int(11) DEFAULT '0',
`J` int(11) DEFAULT '0',
`K` varchar(48) DEFAULT '',
PRIMARY KEY (`Id`),
KEY `Idx1` (`CompanyCode`),
KEY `Idx2` (`B`,`C`),
KEY `Idx3` (`CompanyCode`,`DocumentCode`),
KEY `Idx4` (`CompanyCode`,`B`,`C`),
KEY `Idx5` (`H`),
KEY `Idx6` (`CompanyCode`,`K`),
KEY `Idx7` (`K`),
KEY `Idx8` (`K`,`E`),
KEY `NEWIDX` (`DocumentCode`,`EpochSeconds`),
) ENGINE=MyISAM AUTO_INCREMENT=397783215 DEFAULT CHARSET=latin1
CREATE TABLE `Company` (
`Id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`CompanyCode` int(10) unsigned NOT NULL,
`CompanyName` varchar(150) NOT NULL,
`C` varchar(2) NOT NULL,
`D` varchar(10) NOT NULL,
`E` varchar(150) NOT NULL,
PRIMARY KEY (`Id`),
KEY `Idx1` (`CompanyCode`),
KEY `Idx2` (`CompanyName`),
KEY `Idx3` (`C`),
KEY `Idx4` (`D`,`C`)
KEY `Idx5` (`E`)
) ENGINE=MyISAM AUTO_INCREMENT=9218804 DEFAULT CHARSET=latin1
我省略了Company
中的大多数列定义,因为我不想让这个问题不必要地复杂化,但那些缺少的列不会涉及任何KEY
定义。< / em>的
Document
有大约1,250万行,Company
有大约600,000行
我已将KEY NEWIDX
添加到Document
以方便以下查询:
SELECT Document。*,Company.CompanyName FROM Document,Company Document.DocumentCode =?和Document.CompanyCode = Company.CompanyCode ORDER BY Document.EpochSeconds desc LIMIT 0,30;
执行计划:
+----+-------------+--------------+------+-----------------------------------+-------------+---------+------------------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+------------------------------------------+-------------+---------+------------------------------+--------+---------------------------------+
| 1 | SIMPLE | Company | ALL | Idx1 | NULL | NULL | NULL | 593729 | Using temporary; Using filesort |
| 1 | SIMPLE | Document | ref | Idx1,Idx4,Idx6,NEWIDX,Idx3 | Idx3 | 8 | db.Company.CompanyCode,const | 3 | |
+----+-------------+-------+------+-----------------------------------------------------------+-------------+---------+----------------------+--------+------------------------+
如果上面Document.DocumentCode
的值不是8
,则查询会立即返回(0.00秒)。如果值为8
,则查询需要38到45秒之间的任何值。如果我从查询中删除Company
,例如
SELECT * FROM Document其中DocumentCode = 8 ORDER BY EpochSeconds desc LIMIT 0,30;
执行计划:
+----+-------------+-----------+------+---------------+------------+---------+-------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+------------+---------+-------+---------+-------------+
| 1 | SIMPLE | Documents | ref | NEWIDX | NEWIDX | 4 | const | 3654177 | Using where |
+----+-------------+-----------+------+---------------+------------+---------+-------+---------+-------------+
...然后查询立即返回(0.00秒)。
Document.DocumentCode
的可能值范围为369,在这些值之间有足够的分布。Document
中有~315万行DocumentCode
= 8。Document
中有大约150万行DocumentCode
= 9,并且该查询会立即返回。我还在mysqlcheck
表上运行Document
实用程序,但它没有报告任何问题。
为什么在查询中使用Company
连接时,DocumentCode = 8的查询可能会花费这么长时间,而DocumentCode
的任何其他值都会如此快地返回?
以下是DocumentCode = 8的执行计划的比较:
+----+-------------+--------------+------+-----------------------------------+-------------+---------+------------------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+------------------------------------------+-------------+---------+------------------------------+--------+---------------------------------+
| 1 | SIMPLE | Company | ALL | Idx1 | NULL | NULL | NULL | 593729 | Using temporary; Using filesort |
| 1 | SIMPLE | Document | ref | Idx1,Idx4,Idx6,NEWIDX,Idx3 | Idx3 | 8 | db.Company.CompanyCode,const | 3 | |
+----+-------------+-------+------+-----------------------------------------------------------+-------------+---------+----------------------+--------+------------------------+
和DocumentCode = 9:
+----+-------------+----------+------+----------------------------+--------+---------+--------------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+----------------------------+--------+---------+--------------------------+---------+-------------+
| 1 | SIMPLE | Document | ref | Idx1,Idx4,Idx6,NEWIDX,Idx3 | NEWIDX | 4 | const | 1953090 | Using where |
| 1 | SIMPLE | Company | ref | Idx1 | Idx1 | 4 | db.Document.CompanyCode | 1 | |
+----+-------------+----------+------+----------------------------+--------+---------+--------------------------+---------+-------------+
他们显然是不同的,但我不太了解他们,以解释正在发生的事情。此外,执行ANALYZE TABLE Document
和ANALYZE TABLE Company
报告OK
。
答案 0 :(得分:1)
这种行为的原因在于mysql优化查询的方式 - 或者至少尝试过。您在解释的查询中看到了这一点。 Mysql更改它用作查询基础的表。使用documentCode = 8,它基于公司,documentCode = 9,它基于文档。 Mysql认为,对于documentCode = 8,它会更快,如果它不使用索引而是使用另一个表作为基础。为什么我不知道。
我想请你使用一个explizit连接,告诉mysql使用哪个表格顺序:
SELECT Document.*, Company.CompanyName
FROM Document
JOIN Company ON Document.CompanyCode = Company.CompanyCode
WHERE Document.DocumentCode = ?
ORDER BY Document.EpochSeconds desc LIMIT 0, 30;
Mysql甚至支持告诉它,它应该使用什么索引:
SELECT Document.*, Company.CompanyName
FROM Document
JOIN Company USE INDEX Idx1 ON Document.CompanyCode = Company.CompanyCode
WHERE Document.DocumentCode = ?
ORDER BY Document.EpochSeconds desc LIMIT 0, 30;
您也可以尝试FORCE INDEX而不是USE INDEX。那更强。但我想它默认使用Idx1。
但请注意,您的新索引NEWIDX不会用于此查询,因为它需要先加入并过滤结果集 - 它没有索引。因此,结果上的ORDER BY是一项非常昂贵的操作。
答案 1 :(得分:1)
使用STRAIGHT_JOIN强制MySQL在
中进行连接的顺序SELECT Document.*,
Company.CompanyName
FROM Document
STRAIGHT_JOIN Company
ON Document.CompanyCode = Company.CompanyCode
WHERE Document.DocumentCode = ?
ORDER BY Document.EpochSeconds DESC
LIMIT 0, 30;