需要mysql索引指导 - 分组子查询超慢

时间:2015-01-29 11:52:54

标签: mysql

快速概述,我已经制定了一个mysql查询,但需要优化性能。

我原来的帖子是here,但它很冷,我非常想要详细说明我试图实施的一些建议。所以它不是一个愚蠢的帖子,但它是相关的。

这是一个需要45秒加上的查询,第二个子查询中的group by确实减慢了速度。

SELECT * FROM
(
SELECT DISTINCT email,
       title,
       first_name,
       last_name,
       'chauntry' AS source,
    post_code AS postcode
FROM chauntry
WHERE mailing_indicator = 1
) AS x
JOIN
(
SELECT email, 
           Avg(amount_paid)                AS avg_paid, 
           Count(*)                        AS no_times_booked, 
           Count(DISTINCT( Date_format(added, '%M %Y') )) AS unique_months 
    FROM   chauntry 
    WHERE  added >= Now() - INTERVAL 1 year 
    GROUP  BY email
) AS y
ON x.email = y.email

根据here的索引建议,我查看了一些索引的示例,并提出了以下内容

ALTER TABLE `chauntry` 
  ADD INDEX(`mailing_indicator`, `email`); 

ALTER TABLE `chauntry` 
  ADD INDEX covering_index (`added`, `email`, `amount_paid`);  

这对查询时间没有任何影响,我不知道我现在做的是什么甚至接近,直到现在我还没有必要使用索引。

建议欢迎如何正确索引我的表或如何修改查询。

3 个答案:

答案 0 :(得分:0)

出于好奇,此查询是否符合您的要求?

SELECT email, title, first_name, last_name, 'chauntry' AS source,
       post_code AS postcode,
       Avg(amount_paid)                AS avg_paid, 
       Count(*)                        AS no_times_booked, 
       Count(DISTINCT( Date_format(added, '%M %Y') )) AS unique_months 
FROM   chauntry 
WHERE  added >= Now() - INTERVAL 1 year 
GROUP  BY email, title, first_name, last_name, post_code
HAVING SUM(mailing_indicator = 1) > 0;

它似乎遵循与查询相同的逻辑,除了邮件指示符需要在过去一年中设置。

答案 1 :(得分:0)

为什么在同一个表的子选择中使用JOIN? 我会试试这个:

SELECT email,
           title,
           first_name,
           last_name,
           'chauntry' AS source,
           post_code AS postcode
           Avg(amount_paid)                               AS avg_paid, 
           Count(*)                                       AS no_times_booked, 
           Count(DISTINCT( Date_format(added, '%M %Y') )) AS unique_months
FROM chauntry
WHERE
    mailing_indicator = 1 and
    added >= Now() - INTERVAL 1 year
GROUP BY email

此外,我认为您不需要任何带有此类查询的索引,可能在addedemail,但您已经添加了这些索引。

答案 2 :(得分:0)

次要游戏。

amount_paid的平均值是最大的问题。如果你准备忍受这个数字不准确的可能性,那么你可以平均amount_paid字段的不同值。在某些情况下,这将给出错误的价值(即,如果您有100次预订,99美元为1美元,1美元为100美元,平均价格为50.50美元而不是1.99美元),但如果支付的金额从未重复,则可以接受

否则你可以使用表的连接来对抗自身。要获取no_times_booked,您可以计算表的DISTINCT唯一标识符(我在这里假设了id)。

SELECT  c1.email,
        c1.title,
        c1.first_name,
        c1.last_name,
        'chauntry'                      AS source,
        c1.post_code                    AS postcode
        Avg(DISTINCT c2.amount_paid)    AS avg_paid, 
        Count(DISTINCT c2.id)           AS no_times_booked, 
        Count(DISTINCT( Date_format(c2.added, '%M %Y') )) AS unique_months 
FROM chauntry c1
INNER JOIN chauntry c2
ON c1.email = c2.email
WHERE c1.mailing_indicator = 1
AND c2.added >= Now() - INTERVAL 1 year 
GROUP BY c1.email,
        c1.title,
        c1.first_name,
        c1.last_name,
        source,
        c1.post_code