使用连接收集多列聚合数据

时间:2013-09-04 00:19:28

标签: mysql sql join

我正在试图弄清楚我想要做的查询在SQL中是否完全可行或可行,或者我是否需要收集原始数据并在我的应用程序中处理它。

我的架构如下所示:

applications
================
id INT

application_steps
=================
id INT
application_id INT
step_id INT
activated_at DATE
completed_at DATE

steps
=====
id INT
step_type_id INT

理想情况下,此数据位于application_steps

| id | application_id | step_id | activated_at | completed_at |
| 1  | 1              | 1       | 2013-01-01   | 2013-01-02   |
| 2  | 1              | 2       | 2013-01-02   | 2013-01-02   |
| 3  | 1              | 3       | 2013-01-02   | 2013-01-10   |
| 4  | 1              | 4       | 2013-01-10   | 2013-01-11   |
| 5  | 2              | 1       | 2013-02-02   | 2013-02-02   |
| 6  | 2              | 2       | 2013-02-02   | 2013-02-07   |
| 7  | 2              | 4       | 2013-02-09   | 2013-02-11   |

我想得到这个结果:

| application_id | step_1_days | step_2_days | step_3_days | step_4_days |
| 1              | 1           | 0           | 8           | 1           |
| 2              | 0           | 5           | NULL        | 2           |

请注意,实际上我会看到更多步骤和更多应用程序。

如您所见,applicationsapplication_steps之间存在 has-many 关系。给定步骤也可能不用于特定应用。我想得到每个步骤花费的时间(使用DATEDIFF(completed_at, activated_at)),所有这些都在一行中(列名无关紧要)。这有可能吗?

次要问题:为了使事情进一步复杂化,我还需要一个将application_stepssteps连接起来的辅助查询,并且只获取具有特定step_type_id的步骤的数据。假设第一部分是可能的,我该如何将其扩展为有效过滤?

注意:效率在这里是关键 - 这是针对年度报告,相当于约2500 applications,生产中有70个steps和44,000 application_steps(不是大量数据) ,但是当连接被考虑在内时可能会很多。)

1 个答案:

答案 0 :(得分:1)

这应该是一个基本的“旋转”聚合:

select id,
       max(case when step_id = 1 then datediff(completed_at, activated_at) end) as step_1_days,
       max(case when step_id = 2 then datediff(completed_at, activated_at) end) as step_2_days,
       max(case when step_id = 3 then datediff(completed_at, activated_at) end) as step_3_days,
       max(case when step_id = 4 then datediff(completed_at, activated_at) end) as step_4_days
from application_steps s
group by id;

你必须在所有70个步骤中重复这一步。

仅针对特定类型的步骤执行此操作:

select application_id,
       max(case when step_id = 1 then datediff(completed_at, activated_at) end) as step_1_days,
       max(case when step_id = 2 then datediff(completed_at, activated_at) end) as step_2_days,
       max(case when step_id = 3 then datediff(completed_at, activated_at) end) as step_3_days,
       max(case when step_id = 4 then datediff(completed_at, activated_at) end) as step_4_days
from application_steps s join
     steps
     on s.step_id = steps.id and
        steps.step_type_id = XXX
group by application_id;