MySQL:日志摘要查询

时间:2011-12-12 21:02:08

标签: mysql group-by

我的CMS系统中有一个模块,允许网站显示广告。它记录视图和点击。我用来总结日志的查询表现不佳。

这是查询:

SELECT `a`.`id`,
    `a`.`active`,
    `a`.`static`,
    `a`.`position`,
    `a`.`file`,
    `a`.`title`,
    `a`.`url`,
    COUNT(DISTINCT `lv`.`id`) AS `views`,
    COUNT(DISTINCT `lc`.`id`) AS `clicks` 
FROM `ads` AS `a` 
LEFT JOIN `ad_log` AS `lv` 
    ON `lv`.`ad_id` = `a`.`id` 
    AND `lv`.`type` = 'view' 
    AND `lv`.`created` BETWEEN '2011-01-01 00:00:00'
        AND '2011-12-31 23:59:59' 
LEFT JOIN `ad_log` AS `lc` 
    ON `lc`.`ad_id` = `a`.`id` 
    AND `lc`.`type` = 'click' 
    AND `lc`.`created` BETWEEN '2011-01-01 00:00:00' 
        AND '2011-12-31 23:59:59' 
GROUP BY `a`.`id` 
ORDER BY `a`.`static` DESC,
    `a`.`position` ASC,
    `a`.`title` ASC 

ad_log表在ad_idtype列上有两列索引。当我查看分析器结果时,它正在使用该索引。不同的指数会更高效吗?


更新

在测试不同的索引组合后,似乎当前的最佳组合。也许有更好的方法来编写查询?

以下是EXPLAIN SELECT SQL_NO_CACHE ...的屏幕截图:

EXPLAIN SELECT SQL_NO_CACHE ...


我接受了DRapp's solution,但这是我想出的查询。它的性能仅略低于DRapp's solution

SELECT `a`.`id`,
    `a`.`active`,
    `a`.`static`,
    `a`.`position`,
    `a`.`file`,
    `a`.`title`,
    `a`.`url`,
    (SELECT COUNT(*)
        FROM `ad_log` 
        WHERE `ad_id` = `a`.`id` 
        AND `type` = 'view' 
        AND `created` BETWEEN '2011-11-01 00:00:00' 
            AND '2011-11-30 23:59:59') AS `views`,
    (SELECT COUNT(*) 
        FROM `ad_log`
        WHERE `ad_id` = `a`.`id`
        AND `type` = 'click'
        AND `created` BETWEEN '2011-11-01 00:00:00'
            AND '2011-11-30 23:59:59') AS `clicks` 
FROM `ads` AS `a` 
ORDER BY `a`.`static` DESC,
    `a`.`position` ASC,
    `a`.`title` ASC 

最佳表现

此查询受DRapp's solution启发,性能更佳:

SELECT `a`.`id`,
    `a`.`active`,
    `a`.`static`,
    `a`.`position`,
    `a`.`file`,
    `a`.`title`,
    `a`.`url`,
    SUM(CASE WHEN `l`.`type` = 'view' THEN 1 ELSE 0 END) AS `views`,
    SUM(CASE WHEN `l`.`type` = 'click' THEN 1 ELSE 0 END) AS `clicks` 
FROM `ads` AS `a` 
LEFT JOIN `ad_log` AS `l`
    ON `a`.`id` = `l`.`ad_id`
    AND `l`.`created` BETWEEN '2011-11-01 00:00:00'
        AND '2011-11-30 23:59:59'
GROUP BY `a`.`id`
ORDER BY `a`.`static` DESC,
    `a`.`position` ASC,
    `a`.`title` ASC 

2 个答案:

答案 0 :(得分:2)

您可以将ad_id, type, and created编入索引以获得更快的结果。

This是关于如何索引连接的好读物。阅读其他案例,它们也很有帮助。

您可以通过索引GROUP BY列来进一步优化它,但记住使用更多索引,您的写入速度会更慢。

答案 1 :(得分:1)

另一种方法可能是将子选择作为按日期范围ONCE预聚合所有视图/点击的联接,然后加入所有可用的广告。

SELECT 
      a.id,
      a.active,
      a.static,
      a.position,
      a.file,
      a.title,
      a.url,
      COALESCE( PreAgg.CntViews, 0 ) views,
      COALESCE( PreAgg.CntClicks, 0 ) clicks
   FROM
      ads AS a 
      LEFT JOIN 
         ( select lv.ad_id,
                  sum( if( lv.type = 'view', 1, 0 )) as CntViews,
                  sum( if( lv.type = 'click', 1, 0 )) as CntClicks
              from
                 ad_log lv
              where
                     lv.type in ( 'view', 'click' )
                 and lv.created between '2011-01-01 00:00:00'
                                    AND '2011-12-31 23:59:59' 
              group by
                  lv.ad_id ) PreAgg
        on A.ID = PreAgg.Ad_ID

如果基于(type,created,ad_id)在Ad_Log表上有索引,可能会更快......这样,对于每个“Type”将被分组,然后在每个类型中,向右跳转到日期范围。所以它应该只需要点击索引的2个部分...来自/到的“视图”和从/到的“点击”。而不是每个“广告ID”,然后检查类型,然后检查日期......