使用日期函数优化组的Mysql查询

时间:2011-02-11 19:52:37

标签: mysql group-by query-optimization

我有一份报告从摘要表中提取信息,理想情况下会同时从两个时段,当前时段和上一个时段中提取信息。我的表格如此构成:

report_table
item_id INT(11)
amount Decimal(8,2)
day DATE

主键是item_id,day。该表目前拥有37k条记录,包含92个不同的项目和1200个不同的日期。我使用的是Mysql 5.1。

这是我的选择声明:

SELECT r.day, sum(r.amount)/(count(distinct r.item_id)*count(r.day)) AS `current_avg_day`, 
sum(r2.amount)/(count(distinct r2.item_id)*count(r2.day)) AS `previous_avg_day` 
FROM `client_location_item` AS `cla`
 INNER JOIN `client_location` AS `cl`
 INNER JOIN `report_item_day` AS `r`
 INNER JOIN `report_item_day` AS `r2` 
 WHERE (r.item_id = cla.item_id) 
 AND (cla.location_id = cl.location_id) 
 AND (r.day between from_unixtime(1293840000) and from_unixtime(1296518399)) 
 AND (r2.day between from_unixtime(1291161600) and from_unixtime(1293839999)) 
 AND (cl.location_code = 'LOCATION')
 group by month(r.day);

目前,此查询在我的环境中需要2.2秒。解释计划是:

'1', 'SIMPLE', 'cl', 'ALL', 'PRIMARY', NULL, NULL, NULL, '33', 'Using where; Using temporary; Using filesort'
'1', 'SIMPLE', 'cla', 'ref', 'PRIMARY,location_id,location_id_idxfk', 'location_id', '4', 'cl.location_id', '1', 'Using index'
'1', 'SIMPLE', 'r', 'ref', 'PRIMARY', 'PRIMARY', '4', cla.asset_id', '211', 'Using where'
'1', 'SIMPLE', 'r2', 'ALL', NULL, NULL, NULL, NULL, '37602', 'Using where; Using join buffer'

如果我在“day”列中添加索引,而不是我的查询运行得更快,它将在2.4秒内运行。当时查询的解释计划是:

'1', 'SIMPLE', 'r2', 'range', 'report_day_day_idx', 'report_day_day_idx', '3', NULL, '1092', 'Using where; Using temporary; Using filesort'
'1', 'SIMPLE', 'r', 'range', 'PRIMARY,report_day_day_idx', 'report_day_day_idx', '3', NULL, '1180', 'Using where; Using join buffer'
'1', 'SIMPLE', 'cla', 'eq_ref', 'PRIMARY,location_id,location_id_idxfk', 'PRIMARY', '4', 'r.asset_id', '1', 'Using where'
'1', 'SIMPLE', 'cl', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', cla.location_id', '1', 'Using where'

根据MySQL文档,执行时最有效的组是有一个索引来检索分组列。但它也指出,唯一可以真正利用索引的函数是min()和max()。有没有人有任何想法,我可以做什么来进一步优化我的查询?或者为什么,我的“索引”版本运行得更慢,尽管整体行数比非索引版本少?

创建表格:

CREATE TABLE `report_item_day` (
  `item_id` int(11) NOT NULL,
  `amount` decimal(8,2) DEFAULT NULL,
  `day` date NOT NULL,
  PRIMARY KEY (`item_id`,`day`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

当然我的另一个选择是进行2个db调用,每个时间段一个。如果我这样做,立即查询每个下降到0.031s。我仍然觉得应该有一种方法来优化这个查询以获得可比较的结果。

3 个答案:

答案 0 :(得分:2)

三件事:

1)我在WHERE子句中没有看到r2.item_id的东西。没有它,r2会通过笛卡尔积计算出来,并且还会总结其他item_ids。

将原始查询更改为:

SELECT r.day
      ,sum(r.amount)/(count(distinct r.item_id)*count(r.day)) AS `current_avg_day`
      ,sum(r2.amount)/(count(distinct r2.item_id)*count(r2.day)) AS `previous_avg_day`
FROM `client_location_item` AS `cla`
INNER JOIN `client_location` AS `cl`
INNER JOIN `report_item_day` AS `r`
INNER JOIN `report_item_day` AS `r2`
WHERE (r.item_id = cla.item_id) AND (r2.item_id = cla.item_id) AND (cla.location_id = cl.location_id)
AND (r.day between from_unixtime(1293840000) and from_unixtime(1296518399))
AND (r2.day between from_unixtime(1291161600) and from_unixtime(1293839999))
AND (cl.location_code = 'LOCATION')
group by month(r.day); 

在此之后查看EXPLAIN PLAN是否发生了变化。

2)执行此操作:ALTER TABLE report_itme_day ADD INDEX (date,item_id);

这将索引扫描日期而不是项目ID。

在此之后查看EXPLAIN PLAN是否发生了变化。

3)最后的手段:重构查询

SELECT r.day, sum(r.amount)/(count(distinct r.item_id)*count(r.day)) AS `current_avg_day`, sum(r2.amount)/(count(distinct r2.item_id)*count(r2.day)) AS `previous_avg_day` FROM
(SELECT CLA.item_id FROM client_location CL,client_location_item CLA WHERE CLA.location_code = 'LOCATION' AND CLA.location_id=CL.location_id) A,
report_item_day r,
report_item_day r2,
WHERE (r.item_id  = A.item_id)
AND   (r2.item_id = A.item_id)
AND   (r.day  between from_unixtime(1293840000) and from_unixtime(1296518399))
AND   (r2.day between from_unixtime(1291161600) and from_unixtime(1293839999))
group by month(r.day); 

这绝对可以进一步重构。我只是把它重新点了一下。

试一试!!!

答案 1 :(得分:1)

首先(这可能只是美学),你为什么不在INNER JOIN中使用ON / USING子句?为什么在FROM?

中对WHERE子句而不是实际部分进行JOIN

其次,我对索引与非索引问题的猜测是,现在它必须首先检查索引以查找与所述范围匹配的记录,而在非索引版本中,内存比磁盘更快。但我不能太确定。

现在,对于查询。这是文档的一部分。在JOINs:

The `conditional_expr` used with ON is any conditional expression of the form 
that can be used in a WHERE clause. Generally, you should use the ON clause for
conditions that specify how to join tables, and the WHERE clause to restrict
which rows you want in the result set.

所以是的,将连接条件移动到FROM子句。此外,您可能对索引提示语法感兴趣:http://dev.mysql.com/doc/refman/5.0/en/index-hints.html

最后,您可以尝试使用视图,但要注意性能问题:http://www.mysqlperformanceblog.com/2007/08/12/mysql-view-as-performance-troublemaker/

祝你好运。

答案 2 :(得分:1)

为什么您在按月分组时选择日期?我不完全想要你的查询输出看起来像什么。 我讨厌MySQL允许这个!

我将向您展示一次查询2个句点的两种方法。第一个是union all查询。它应该做你的2查询方法已经做的事情。它将返回2行,每个句点一个。

select sum(r.amount)  / (count(distinct r.item_id)  * count(r.day) ) as curr_avg
  from report_item_day r
  join client_location_item cla using(item_id)
  join client_location      cl  using(location_id)
 where cl.location_code = 'LOCATION'
   and r.day between from_unixtime(1293840000) and from_unixtime(1296518399)
union all
select sum(r.amount)  / (count(distinct r.item_id)  * count(r.day) ) as prev_avg
  from report_item_day r
  join client_location_item cla using(item_id)
  join client_location      cl  using(location_id)
 where cl.location_code = 'LOCATION'
   and r.day between from_unixtime(1291161600) and from_unixtime(1293839999)

以下方法可能比上述方法更快,但它更难以阅读。

select period
      ,sum(amount) / (count(distinct item_id) * count(day) ) as avg_day
  from (select case when r.day between from_unixtime(1293840000) and from_unixtime(1296518399) then 'Current'
                    when r.day between from_unixtime(1291161600) and from_unixtime(1293839999) then 'Previous'
                end as period
               ,r.amount
               ,r.item_id
               ,r.day
           from report_item_day r
           join client_location_item cla using(item_id)
           join client_location      cl  using(location_id)
          where cl.location_code = 'LOCATION'
            and (    r.day between from_unixtime(1293840000) and from_unixtime(1296518399)
                  or r.day between from_unixtime(1291161600) and from_unixtime(1293839999)
                )
         ) v
 group 
     by period;

注1:你没有给我们DDL,所以我无法测试语法是否正确 注2:考虑创建一个日历表,由DATE键入。添加适当的列,例如MONTH,WEEK,FINANCIAL_YEAR等,以便能够支持您正在执行的报告。查询将更容易编写和理解。