Question

表架构

对于这两个表，CREATE查询如下：

表1： （file_path_key，dir_path_key）

create table Table1(
             file_path_key varchar(500), 
             dir_path_key varchar(500), 
             primary key(file_path_key)) 
engine = innodb;

表2： （file_path_key，hash_key）

create table Table2(
             file_path_key varchar(500) not null, 
             hash_key bigint(20) not null, 
             foreign key (file_path_key) references Table1(file_path_key) on update cascade on delete cascade)
engine = innodb;

目的

给定一个file_path F 并且它是dir_path字符串 D ，我需要找到所有这些文件名在 F 的哈希集中至少有一个哈希值，但是没有他们的目录名称 D 。如果文件 F1 共享多个哈希值使用 F ，那么它应该多次重复。

请注意，Table1中的file_path_key列和Table2中的hash_key列已编制索引。

在这种特殊情况下，Table1有大约350,000个条目，Table2有31,167,119个条目，这使我当前的查询变慢：

create table temp 
        as select hash_key from Table2 
        where file_path_key = F;

select s1.file_path_key 
        from Table1 as s1 
        join Table2 as s2 
        on s1.file_path_key join 
        temp on temp.hash_key = s2.hash_key 
        where s1.dir_path_key != D

如何加快此查询？

Answer 1

我不明白temp表的目的是什么，但请记住，使用CREATE .. SELECT创建的这样的表没有任何索引。所以至少要将该声明修复为

CREATE TABLE temp (INDEX(hash_key)) ENGINE=InnoDB AS 
SELECT hash_key FROM Table2 WHERE file_path_key = F;

否则其他SELECT与temp执行完全连接，因此可能非常慢。

我还建议在Table1中使用数字主键（INT，BIGINT）并从Table2而不是text列引用它。例如：

create table Table1(
             id int not null auto_increment primary key,
             file_path_key varchar(500), 
             dir_path_key varchar(500), 
             unique key(file_path_key)) 
engine = innodb;

create table Table2(
             file_id int not null, 
             hash_key bigint(20) not null, 
             foreign key (file_id) references Table1(id) 
            on update cascade on delete cascade) engine = innodb;

如果在连接谓词而不是文本列中使用整数列，则加入两个表的查询可能会快得多。

优化此MySQL查询

1 个答案: