Mysql优化器在查询

时间:2015-05-18 11:44:42

标签: mysql join

我们有4个表的简单数据库:files,file_versions,users,organization。 我通过此查询选择某些组织拥有的所有文件,其中包含废弃日期的某些条件:

select * FROM organizations o
    LEFT JOIN users u ON o.id=u.organization_id
    LEFT JOIN files f ON u.user_identity=f.owner_identity
    LEFT JOIN file_versions fv ON f.owner_identity=fv.owner_identity
        AND f.local_path=fv.local_path
    WHERE o.id=2001237 AND o.trashed_file_age_limit>=1
        AND f.trashing_date<(1433943058 - o.trashed_file_age_limit*24*60*60);

Explain select向我显示优化程序选择了错误的表顺序,这与查询顺序不同(organization-&gt; users-&gt; files-&gt; file_versions):

mysql> explain select * FROM organizations o     LEFT JOIN users u ON o.id=u.organization_id     LEFT JOIN files f ON u.user_identity=f.owner_identity     LEFT JOIN file_versions fv ON f.owner_identity=fv.owner_identity         AND f.local_path=fv.local_path     WHERE o.id=2001237 AND o.trashed_file_age_limit>=1         AND f.trashing_date<(1433943058 - o.trashed_file_age_limit*24*60*60);
+----+-------------+-------+--------+----------------------------------+----------+---------+----------------------------------------------------+-----------+-------------+
| id | select_type | table | type   | possible_keys                    | key      | key_len | ref                                                | rows      | Extra       |
+----+-------------+-------+--------+----------------------------------+----------+---------+----------------------------------------------------+-----------+-------------+
|  1 | SIMPLE      | o     | const  | PRIMARY                          | PRIMARY  | 4       | const                                              |         1 |             |
|  1 | SIMPLE      | f     | ALL    | PRIMARY                          | NULL     | NULL    | NULL                                               | 109615125 | Using where |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY,identity,organization_id | identity | 36      | filemirror.f.owner_identity                        |         1 | Using where |
|  1 | SIMPLE      | fv    | ref    | PRIMARY                          | PRIMARY  | 3035    | filemirror.u.user_identity,filemirror.f.local_path |         1 |             |
+----+-------------+-------+--------+----------------------------------+----------+---------+----------------------------------------------------+-----------+-------------+
4 rows in set (0.01 sec)

由于文件表的完全扫描,这个查询很慢,我必须使用STRAIGHT_JOIN(这不等同于LEFT JOIN)来修复表顺序并使查询更快。

mysql> explain select * FROM organizations o     STRAIGHT_JOIN users u ON o.id=u.organization_id     STRAIGHT_JOIN files f ON u.user_identity=f.owner_identity     STRAIGHT_JOIN file_versions fv ON f.owner_identity=fv.owner_identity         AND f.local_path=fv.local_path     WHERE o.id=2001237 AND o.trashed_file_age_limit>=1         AND f.trashing_date<(1433943058 - o.trashed_file_age_limit*24*60*60);
+----+-------------+-------+-------+----------------------------------+---------+---------+----------------------------------------------------+---------+-------------+
| id | select_type | table | type  | possible_keys                    | key     | key_len | ref                                                | rows    | Extra       |
+----+-------------+-------+-------+----------------------------------+---------+---------+----------------------------------------------------+---------+-------------+
|  1 | SIMPLE      | o     | const | PRIMARY                          | PRIMARY | 4       | const                                              |       1 |             |
|  1 | SIMPLE      | u     | ref   | PRIMARY,identity,organization_id | PRIMARY | 4       | const                                              |      36 |             |
|  1 | SIMPLE      | f     | ref   | PRIMARY                          | PRIMARY | 36      | filemirror.u.user_identity                         | 6089324 | Using where |
|  1 | SIMPLE      | fv    | ref   | PRIMARY                          | PRIMARY | 3035    | filemirror.u.user_identity,filemirror.f.local_path |       1 |             |
+----+-------------+-------+-------+----------------------------------+---------+---------+----------------------------------------------------+---------+-------------+
4 rows in set (0.00 sec)

我的问题是为什么mysql可以在非对称连接操作中更改表顺序?

表格结构:

CREATE TABLE `file_versions` (
  `owner_identity` char(36) character set latin1 collate latin1_bin NOT NULL,
  `local_path` varchar(999) character set utf8 NOT NULL,
  `version_number` int(11) unsigned NOT NULL,
...
  PRIMARY KEY  (`owner_identity`,`local_path`,`version_number`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;

CREATE TABLE `files` (
  `owner_identity` char(36) character set latin1 collate latin1_bin NOT NULL,
  `local_path` varchar(999) character set utf8 NOT NULL,
  `version_number` int(11) unsigned NOT NULL,
..
  `trashing_date` int(11) default NULL,
...
  PRIMARY KEY  (`owner_identity`,`local_path`),
  KEY `trashing_date` (`trashing_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;

CREATE TABLE `organizations` (
  `id` int(11) NOT NULL,
...
  `trashed_file_age_limit` int(11) default NULL,
...
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;

CREATE TABLE `users` (
  `organization_id` int(11) NOT NULL,
  `id` int(11) NOT NULL,
  `user_identity` char(36) character set latin1 collate latin1_bin NOT NULL,
...
  PRIMARY KEY  (`organization_id`,`id`),
  UNIQUE KEY `identity` (`user_identity`),
  KEY `organization_id` (`organization_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;

Mysql版本5.5

1 个答案:

答案 0 :(得分:1)

看看行估计,mysql认为它需要在第一个计划中读取109M行文件表,为36个用户中每个读取6M =第二个计划的216M行。因此,只读取所有109M行并按照priamry键顺序而不是在单独的块中读取它们似乎是合理的。这些估计对我来说似乎不太合理,所以我会尝试在文件上运行analyze table,但它们是估计所以也许你不会得到更好的数字。

使用LEFT连接,然后在表上添加条件,WHERE将其转换为INNER连接,正如Strawberry在评论中所说的那样 - 你必须拥有where条件永远为真的值,所以mysql可以自由地重新排序那些,甚至优化者首先做“真正的内在”联接似乎更好,这可能是该计划的第二个原因。

您可以尝试以不同的方式使用STRAIGHT_JOIN - 如果您在SELECT之后立即使用它,那么优化器会使用您的连接顺序(如果可能的话,通常会禁止一些奇怪的右连接和其他边角情况)而不更改连接类型在特定的表上(然后将它用作FLAG的排序,使用SQL_NO_CACHE来表示某些东西,而不是作为特殊的连接类型)

然后为了做得更好,您可以尝试在(owner_identity,trashing_date)上的文件中添加索引,这应该有助于为每个用户本地化特定文件,而不是仅使用(trashing_date)上的当前密钥进行全局化。