如何优化使用COUNT DISTINCT

时间:2015-11-26 16:20:21

标签: mysql sql database

我有一个非常慢的MySQL查询,我想优化。

查询正在 66.2070秒从包含大约200行的表中返回5个结果。

数据库表存储usersexperiments(A / B测试),goals(页面网址),visits(页面访问)和conversions (点击目标的网址)。 visitconversion表都有一个combination列,用于记录是否访问了页面的版本A或B,还是来自版本A或B的转换。组合存储在数据库中为12

我试图获取用户实验的列表,其中包含每个组合的访问次数和转化次数。

对于某些关系,我使用复合主键,这确实使连接更加复杂。我怀疑它,但这可能是问题的原因吗?

如何重写此查询以使其在合理的时间内运行,至少不到一秒钟?

这是我的数据库架构:

Database schema diagram

和她的查询:

SELECT e.id                  AS id, 
       e.name                AS name, 
       e.status              AS status, 
       e.created             AS created, 
       Count(DISTINCT v1.id) AS visits1, 
       Count(DISTINCT v2.id) AS visits2, 
       Count(DISTINCT c1.id) AS conversions1, 
       Count(DISTINCT c2.id) AS conversions2 
FROM   experiment e 
       LEFT JOIN visit v1 
              ON ( v1.experiment_id = e.id 
                   AND v1.user_id = e.user_id 
                   AND v1.combination = 1 ) 
       LEFT JOIN visit v2 
              ON ( v2.experiment_id = e.id 
                   AND v2.user_id = e.user_id 
                   AND v2.combination = 2 ) 
       LEFT JOIN goal g 
              ON ( g.experiment_id = e.id 
                   AND g.user_id = e.user_id 
                   AND g.principal = 1 ) 
       LEFT JOIN conversion c1 
              ON ( c1.experiment_id = e.id 
                   AND c1.user_id = e.user_id 
                   AND c1.goal_id = g.id 
                   AND c1.combination = 1 ) 
       LEFT JOIN conversion c2 
              ON ( c2.experiment_id = e.id 
                   AND c2.user_id = e.user_id 
                   AND c2.goal_id = g.id 
                   AND c2.combination = 2 ) 
WHERE  e.user_id = 25 
GROUP  BY e.id 
ORDER  BY e.created DESC 
LIMIT  5 

结果表应如下所示:

Results table

2 个答案:

答案 0 :(得分:2)

在进行连接之前,您应该进行聚合,以避免获得大的中间结果。我认为逻辑是

SELECT e.id, e.name, e.status, e.created, 
       v.visits1, v.visits2, g.conversions1, g.conversions2 
FROM experiment e LEFT JOIN
     (SELECT experiment_id, user_id, 
             SUM(combination = 1) as visits1,
             SUM(combination = 2) as visits2
      FROM visits
      WHERE combination IN (1, 2)
      GROUP BY experiment_id, user_id
     ) v
     ON v.experiment_id = e.id AND
        v.user_id = e.user_id LEFT JOIN
     (SELECT g.experiment_id, g.user_id, 
             SUM(c.combination = 1) as conversions1,
             SUM(c.combination = 2) as conversions2
      FROM goal g LEFT JOIN
           conversion c
           ON c.experiment_id = g.experiment_id AND
              c.user_id = g.user_id AND
              c.goal_id = g.id
      WHERE g.principal = 1
      GROUP BY g.experiment_id, g.user_id
     ) g
     ON g.experiment_id = e.id AND
        g.user_id = e.user_id LEFT JOIN
WHERE e.user_id = 25 
ORDER BY e.created DESC 
LIMIT 5 ;

还有进一步的优化。例如,experiment(user_id, created, id)上的索引。

答案 1 :(得分:0)

关于使用复合键的缺点的问题,我发现了这个:

Drawback of composite keys

我目前无法测试你的数据库,但在mysql中使用EXPLAIN语法来查找你的查询性能有什么问题:

MySQL docs about EXPLAIN and optimizing ur query with EXPLAIN