我们有4个表的简单数据库:files,file_versions,users,organization。 我通过此查询选择某些组织拥有的所有文件,其中包含废弃日期的某些条件:
select * FROM organizations o
LEFT JOIN users u ON o.id=u.organization_id
LEFT JOIN files f ON u.user_identity=f.owner_identity
LEFT JOIN file_versions fv ON f.owner_identity=fv.owner_identity
AND f.local_path=fv.local_path
WHERE o.id=2001237 AND o.trashed_file_age_limit>=1
AND f.trashing_date<(1433943058 - o.trashed_file_age_limit*24*60*60);
Explain select
向我显示优化程序选择了错误的表顺序,这与查询顺序不同(organization-&gt; users-&gt; files-&gt; file_versions):
mysql> explain select * FROM organizations o LEFT JOIN users u ON o.id=u.organization_id LEFT JOIN files f ON u.user_identity=f.owner_identity LEFT JOIN file_versions fv ON f.owner_identity=fv.owner_identity AND f.local_path=fv.local_path WHERE o.id=2001237 AND o.trashed_file_age_limit>=1 AND f.trashing_date<(1433943058 - o.trashed_file_age_limit*24*60*60);
+----+-------------+-------+--------+----------------------------------+----------+---------+----------------------------------------------------+-----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------------------------+----------+---------+----------------------------------------------------+-----------+-------------+
| 1 | SIMPLE | o | const | PRIMARY | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | f | ALL | PRIMARY | NULL | NULL | NULL | 109615125 | Using where |
| 1 | SIMPLE | u | eq_ref | PRIMARY,identity,organization_id | identity | 36 | filemirror.f.owner_identity | 1 | Using where |
| 1 | SIMPLE | fv | ref | PRIMARY | PRIMARY | 3035 | filemirror.u.user_identity,filemirror.f.local_path | 1 | |
+----+-------------+-------+--------+----------------------------------+----------+---------+----------------------------------------------------+-----------+-------------+
4 rows in set (0.01 sec)
由于文件表的完全扫描,这个查询很慢,我必须使用STRAIGHT_JOIN(这不等同于LEFT JOIN)来修复表顺序并使查询更快。
mysql> explain select * FROM organizations o STRAIGHT_JOIN users u ON o.id=u.organization_id STRAIGHT_JOIN files f ON u.user_identity=f.owner_identity STRAIGHT_JOIN file_versions fv ON f.owner_identity=fv.owner_identity AND f.local_path=fv.local_path WHERE o.id=2001237 AND o.trashed_file_age_limit>=1 AND f.trashing_date<(1433943058 - o.trashed_file_age_limit*24*60*60);
+----+-------------+-------+-------+----------------------------------+---------+---------+----------------------------------------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+----------------------------------+---------+---------+----------------------------------------------------+---------+-------------+
| 1 | SIMPLE | o | const | PRIMARY | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | u | ref | PRIMARY,identity,organization_id | PRIMARY | 4 | const | 36 | |
| 1 | SIMPLE | f | ref | PRIMARY | PRIMARY | 36 | filemirror.u.user_identity | 6089324 | Using where |
| 1 | SIMPLE | fv | ref | PRIMARY | PRIMARY | 3035 | filemirror.u.user_identity,filemirror.f.local_path | 1 | |
+----+-------------+-------+-------+----------------------------------+---------+---------+----------------------------------------------------+---------+-------------+
4 rows in set (0.00 sec)
我的问题是为什么mysql可以在非对称连接操作中更改表顺序?
表格结构:
CREATE TABLE `file_versions` (
`owner_identity` char(36) character set latin1 collate latin1_bin NOT NULL,
`local_path` varchar(999) character set utf8 NOT NULL,
`version_number` int(11) unsigned NOT NULL,
...
PRIMARY KEY (`owner_identity`,`local_path`,`version_number`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
CREATE TABLE `files` (
`owner_identity` char(36) character set latin1 collate latin1_bin NOT NULL,
`local_path` varchar(999) character set utf8 NOT NULL,
`version_number` int(11) unsigned NOT NULL,
..
`trashing_date` int(11) default NULL,
...
PRIMARY KEY (`owner_identity`,`local_path`),
KEY `trashing_date` (`trashing_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
CREATE TABLE `organizations` (
`id` int(11) NOT NULL,
...
`trashed_file_age_limit` int(11) default NULL,
...
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
CREATE TABLE `users` (
`organization_id` int(11) NOT NULL,
`id` int(11) NOT NULL,
`user_identity` char(36) character set latin1 collate latin1_bin NOT NULL,
...
PRIMARY KEY (`organization_id`,`id`),
UNIQUE KEY `identity` (`user_identity`),
KEY `organization_id` (`organization_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
Mysql版本5.5
答案 0 :(得分:1)
看看行估计,mysql认为它需要在第一个计划中读取109M行文件表,为36个用户中每个读取6M =第二个计划的216M行。因此,只读取所有109M行并按照priamry键顺序而不是在单独的块中读取它们似乎是合理的。这些估计对我来说似乎不太合理,所以我会尝试在文件上运行analyze table,但它们是估计所以也许你不会得到更好的数字。
使用LEFT连接,然后在表上添加条件,WHERE将其转换为INNER连接,正如Strawberry在评论中所说的那样 - 你必须拥有where条件永远为真的值,所以mysql可以自由地重新排序那些,甚至优化者首先做“真正的内在”联接似乎更好,这可能是该计划的第二个原因。
您可以尝试以不同的方式使用STRAIGHT_JOIN - 如果您在SELECT之后立即使用它,那么优化器会使用您的连接顺序(如果可能的话,通常会禁止一些奇怪的右连接和其他边角情况)而不更改连接类型在特定的表上(然后将它用作FLAG的排序,使用SQL_NO_CACHE来表示某些东西,而不是作为特殊的连接类型)
然后为了做得更好,您可以尝试在(owner_identity,trashing_date)上的文件中添加索引,这应该有助于为每个用户本地化特定文件,而不是仅使用(trashing_date)上的当前密钥进行全局化。