多个GROUP BY&按SUM'd组值排序

时间:2010-08-27 15:02:10

标签: mysql group-by

我正在处理我们的时间跟踪应用报告。每次条目都与项目和服务相关。这是一个简化的查询,用于按项目和服务对时间条目进行分组。

SELECT                    
  projects.name as project_name,
  services.name as service_name,
  SUM(minutes) AS minutes 
FROM `time_entries`             
JOIN `projects` ON `projects`.id = `time_entries`.project_id 
JOIN `services` ON `services`.id = `time_entries`.service_id 
GROUP BY 
  time_entries.project_id, 
  time_entries.service_id    
ORDER BY
  max(minutes)   DESC

这将产生如下表格:

+---------------+--------------+---------+
| project_name  | service_name | minutes |
+---------------+--------------+---------+
| Business Card | Consulting   |    4800 |
| Microsite     | Coding       |    3200 |
| Microsite     | Consulting   |    2400 |
| Microsite     | Design       |    2400 |
| Business Card | Design       |     800 |
+---------------+--------------+---------+

我尝试实现的是按SUM'd项目分钟排序的可能性。不是项目»名片«应该在顶部,但项目»Microsite«,因为它有更多的分钟。

+---------------+--------------+-----------------+---------+
| project_name  | service_name | project_minutes | minutes |
+---------------+--------------+-----------------+---------+
| Microsite     | Coding       |            8000 |    3200 |
| Microsite     | Consulting   |            8000 |    2400 |
| Microsite     | Design       |            8000 |    2400 |
| Business Card | Consulting   |            5600 |    4800 |
| Business Card | Design       |            5600 |     800 |
+---------------+--------------+-----------------+---------+

我发现获取列»project_minutes«的唯一方法是首先创建一个表并将其与自身连接。我提出的查询:

DROP TABLE IF EXISTS group2;    
CREATE TABLE group2     SELECT                     
  projects.id as project_id,
  projects.name as project_name,
  services.name as service_name,
  SUM(minutes) AS minutes 
FROM `time_entries`             
JOIN `projects` ON `projects`.id = `time_entries`.project_id 
JOIN `services` ON `services`.id = `time_entries`.service_id 
GROUP BY 
  time_entries.project_id, 
  time_entries.service_id    
ORDER BY
  max(minutes)   DESC
LIMIT 0, 30;

SELECT 
  project_name, service_name, project_minutes, minutes
FROM  
  group2
LEFT JOIN 
  (
    SELECT project_id as project_id, sum(minutes) AS project_minutes
      FROM group2
     GROUP BY project_id         
  ) as group1  on group1.project_id = group2.project_id
ORDER BY 
  project_minutes DESC, 
  minutes DESC;    

由于mySQL Bug(?),我甚至无法创建临时表: http://www.google.com/search?&q=site:bugs.mysql.com+reopen+temporary+table

我的问题:

  1. 实现类似»project_minutes«的列的最佳方法是什么,它可以将组分钟相加并将结果添加为额外的列?是否有一个我不知道的巧妙的SQL技巧?
  2. 如果您没有找到第一个问题的方法,您认为为每个查询创建一个额外的表是否有意义?它比在代码中手动执行此逻辑更快吗?我们使用Rails,以防万一。
  3. 非常感谢你的帮助!

    更新

    感谢您的回复。我总结了它们作为一个要点,以获得更好的概述: http://gist.github.com/553560

    我是对的,除了每个group by语句查询time_entries表之外别无他法吗?如果是,由于以下事实,您是否看到性能问题:

    1. 表time_entries是迄今为止行数最多的(~4百万)
    2. 用户最多可以分组6列。看看这个截图: http://dl.dropbox.com/u/732913/time_entries_grouped_by_customer_project_service_user.png

2 个答案:

答案 0 :(得分:0)

这样的事情应该做你想做的事情:

SELECT ilv1.date_at, ilv1.project_name, ilv1.service_name, ilv1.minutes
FROM 
( SELECT                             
  te1.date_at,
  p1.name as project_name,
  s1.name as service_name,
  SUM(minutes) AS minutes 
FROM time_entries te1             
LEFT OUTER JOIN projects p1 ON p1.id = te1.project_id 
LEFT OUTER JOIN services s1 ON s1.id = te1.service_id 
GROUP BY 
  te1.project_id, 
  te1.service_id) AS ilv1,
( SELECT                             
  te2.date_at,
  p2.name as project_name,
  SUM(minutes) AS minutes 
FROM time_entries te1             
LEFT OUTER JOIN projects p1 ON p1.id = te1.project_id  
GROUP BY 
  te1.project_id) AS ilv2

WHERE ilv1.date_at = ilv2.date_at    AND ilv1.project_name = ilv2.project_name    ORDER BY ilv2.minutes;

(你真的,真的需要所有这些外部联接 - 他们会伤害到很多表现)

在原始查询中使用物化视图(以及如上所述的具有不同分组的双遍查询)可能会更有效。但是中途可能是两次使用相同的查询基础查询,并在合并块中包装一个,例如

SELECT ilv1.date_at, ilv1.project_name, ilv1.service_name, ilv1.minutes
FROM 
 (....) ilv1,
 (SELECT ilv3.date_at, ilv3.project_name, sum(ilv3.minutes) as minutes 
  FROM (...copy of ilv1) ilv3
  GROUP BY ilv3.date_at, ilv3.project_name
 ) ilv2
WHERE ilv1.date_at=ilv2.date_at

AND ilv1.project_name = ilv2.project_name    ORDER BY ilv2.minutes;

下进行。

答案 1 :(得分:0)

我假设time_entries中的project_id始终为NOT NULL,而services_id可以为null

Select t.date, t.project_name, t.service_name, p.minutes as Project_minutes, t.minutes
FROM
(SELECT                             
  time_entries.date_at,
  time_entries.project_Id,
  projects.name as project_name,
  services.name as service_name,
  SUM(minutes) AS minutes 
FROM time_entries             
JOIN projects ON projects.id = time_entries.project_id 
LEFT JOIN services ON services.id = time_entries.service_id 
GROUP BY 
  time_entries.date_at
  time_entries.project_id, 
  time_entries.service_id    
) t
JOIN
  (Select date_at, project_Id, Sum(minutes) minutes
  from time_entries
  group by date_at, project_id) p
ON (p.date_at = t.date_at AND p.project_id = t.project_id)