MySQL质量INNER JOIN查询导致响应缓慢

时间:2017-11-21 11:39:35

标签: mysql sql inner-join query-performance entity-attribute-value

我已经阅读了其他相关问题,但由于它的结构,我的独特之处。

我的应用程序存储了大约10,000多个用户,其中的配置文件由许多参数(性别,体重,身高,头发颜色,眼睛颜色,舞蹈技能等等)定义,等等,大概有100个属性,比如说。

应用程序使用这些属性构建过滤器表单。用户使用此表单过滤数据库,因此构建一个包含许多子查询的查询,每个查询对应一个使用的过滤器。

问题是使用超过8-9个过滤器,引擎会崩溃成很长的响应(我必须在等待30米后终止进程。)

所以,这是数据库的结构

表def_attributes (这里是属性定义)

  • id --->在值表中用作attr_id

表utilizatori (用户定义,现在只使用列激活)

  • id --->在其余表中命名为user_id
  • 激活--->如果用户处于活动状态并且将显示(列索引)
  • ,则为1

表val_atribute (存储每个用户的属性值)

  • attr_id --->过滤器的attrID (列索引)
  • attr_value --->属性值
  • user_id (列索引)

例如,这是一个由Filtering表单构建的查询,它延迟了:

SELECT DISTINCT Q1.user_id
FROM   (SELECT DISTINCT val_atribute.user_id
        FROM   val_atribute
        WHERE  attr_id = 45
               AND attr_value IN ( 'Actor', 'Actor Amator' )) Q1
       INNER JOIN (SELECT DISTINCT val_atribute.user_id
                   FROM   val_atribute
                   WHERE  attr_id = 46
                          AND Floor(Datediff(Curdate(), attr_value) / 365) >= '20') Q2
               ON Q1.user_id = Q2.user_id
       INNER JOIN (SELECT DISTINCT val_atribute.user_id
                   FROM   val_atribute
                   WHERE  attr_id = 46
                          AND Floor(Datediff(Curdate(), attr_value) / 365) <= '50') Q3
               ON Q2.user_id = Q3.user_id
       INNER JOIN (SELECT DISTINCT val_atribute.user_id
                   FROM   val_atribute
                   WHERE  attr_id = 47
                          AND attr_value IN ( 'feminin', 'masculin' )) Q4
               ON Q3.user_id = Q4.user_id
       INNER JOIN (SELECT DISTINCT val_atribute.user_id
                   FROM   val_atribute
                   WHERE  attr_id = 102
                          AND attr_value IN ( 'African', 'Asiatic', 'Caucazian', 'Metis' )) Q5
               ON Q4.user_id = Q5.user_id
       INNER JOIN (SELECT DISTINCT val_atribute.user_id
                   FROM   val_atribute
                   WHERE  attr_id = 103
                          AND attr_value >= 1) Q6
               ON Q5.user_id = Q6.user_id
       INNER JOIN (SELECT DISTINCT val_atribute.user_id
                   FROM   val_atribute
                   WHERE  attr_id = 103
                          AND attr_value <= 200) Q7
               ON Q6.user_id = Q7.user_id
       INNER JOIN (SELECT DISTINCT val_atribute.user_id
                   FROM   val_atribute
                   WHERE  attr_id = 104
                          AND attr_value >= 10) Q8
               ON Q7.user_id = Q8.user_id
       INNER JOIN (SELECT DISTINCT val_atribute.user_id
                   FROM   val_atribute
                   WHERE  attr_id = 104
                          AND attr_value <= 150) Q9
               ON Q8.user_id = Q9.user_id
       INNER JOIN (SELECT DISTINCT val_atribute.user_id
                   FROM   val_atribute
                   WHERE  attr_id = 107
                          AND attr_value IN ( 'Albastri', 'Caprui', 'Heterocrom', 'Verzi' )) Q10
               ON Q9.user_id = Q10.user_id
       INNER JOIN (SELECT DISTINCT val_atribute.user_id
                   FROM   val_atribute
                   WHERE  attr_id = 108
                          AND attr_value IN ( 'Blond', 'Brunet', 'Castaniu', 'Roscat', 'Saten' )) Q11
               ON Q10.user_id = Q11.user_id
       INNER JOIN (SELECT DISTINCT val_atribute.user_id
                   FROM   val_atribute
                   WHERE  attr_id = 109
                          AND attr_value IN ( 'Calvitie', 'Lung', 'Mediu', 'Scurt', 'Zero' )) Q12
               ON Q11.user_id = Q12.user_id
       INNER JOIN (SELECT DISTINCT utilizatori.id
                   FROM   utilizatori
                   WHERE  activ = 1) Q13
               ON Q12.user_id = Q13.id
GROUP  BY user_id

Q2正在计算AGE,因为Weonly属性[出生日期]和过滤器Q2要年龄&gt; 20。

最后一个查询(此处为Q13)始终是来自Table utilizatori的活跃用户。

我认为这是笛卡尔进展的问题但是 问题:我如何重新制作查询以使其更快? 非常感谢你!

编辑/问题已解决:

根据Gordon Linoff的大力帮助,我使用相同的过滤器构建了正确的查询:

SELECT u.id FROM utilizatori u WHERE EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 45 AND attr_value IN ( 'Actor', 'Actor Amator' )) AND EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 46 AND Floor(Datediff(Curdate(), attr_value) / 365) >= 20) AND EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 46 AND Floor(Datediff(Curdate(), attr_value) / 365) <= 50) AND EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 47 AND attr_value IN ( 'feminin', 'masculin' )) AND EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 102 AND attr_value IN ( 'African', 'Asiatic', 'Caucazian', 'Metis' )) AND EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 103 AND attr_value >= 1) AND EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 103 AND attr_value <= 200) AND EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 104 AND attr_value >= 10) AND EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 104 AND attr_value <= 150) AND EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 107 AND attr_value IN ( 'Albastri', 'Caprui', 'Heterocrom', 'Verzi' )) AND EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 108 AND attr_value IN ( 'Blond', 'Brunet', 'Castaniu', 'Roscat', 'Saten' )) AND EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 109 AND attr_value IN ( 'Calvitie', 'Lung', 'Mediu', 'Scurt', 'Zero' )) AND activ = 1

现在查询大约需要0.0015秒才能运行。

2 个答案:

答案 0 :(得分:1)

MySQL中的子查询存在问题 - select distinct使事情变得更糟。您正在使用and连接子查询。我建议使用exists构建相同的逻辑。

所以:

select u.*
from users u
where exists (select 1
              from val_atribute va
              where va.user_id = u.user_id and
                    va.attr_id = 45 and
                    va.attr_value in ( 'Actor', 'Actor Amator' )
             ) and
      exists (select 1
              from val_atribute va
              where va.user_id = u.user_id and
                    va.attr_id = 46 and
                    Floor(Datediff(Curdate(), va.attr_value) / 365) >= 20) Q2
             ) and
      . . .

此版本的查询可以利用val_attribute(user_id, attr_id, attr_value)上的索引。它应该更快,并具有更好的可扩展性。

答案 1 :(得分:1)

这是臭名昭着的低效EAV架构设计的变种。

到目前为止,最佳解决方案(在本课题中)涉及utilizatori的全表扫描,其中有许多探针进入属性表(val_atribute)进行过滤。

为了提高效率,val_atribute需要PRIMARY KEY(user_id, attr_id)。不,这两列上的单独索引就好了。

为了提高效率,您需要提取常用使用的属性并添加索引。这应该避免全表扫描(10K用户,加上大量的属性查找),将其减少到一小部分。

更多讨论:http://mysql.rjweb.org/doc.php/eav