Question

我正在试图弄清楚我想要做的查询在SQL中是否完全可行或可行，或者我是否需要收集原始数据并在我的应用程序中处理它。

我的架构如下所示：

applications
================
id INT

application_steps
=================
id INT
application_id INT
step_id INT
activated_at DATE
completed_at DATE

steps
=====
id INT
step_type_id INT

理想情况下，此数据位于application_steps：

| id | application_id | step_id | activated_at | completed_at |
| 1  | 1              | 1       | 2013-01-01   | 2013-01-02   |
| 2  | 1              | 2       | 2013-01-02   | 2013-01-02   |
| 3  | 1              | 3       | 2013-01-02   | 2013-01-10   |
| 4  | 1              | 4       | 2013-01-10   | 2013-01-11   |
| 5  | 2              | 1       | 2013-02-02   | 2013-02-02   |
| 6  | 2              | 2       | 2013-02-02   | 2013-02-07   |
| 7  | 2              | 4       | 2013-02-09   | 2013-02-11   |

我想得到这个结果：

| application_id | step_1_days | step_2_days | step_3_days | step_4_days |
| 1              | 1           | 0           | 8           | 1           |
| 2              | 0           | 5           | NULL        | 2           |

请注意，实际上我会看到更多步骤和更多应用程序。

如您所见，applications和application_steps之间存在 has-many 关系。给定步骤也可能不用于特定应用。我想得到每个步骤花费的时间（使用DATEDIFF(completed_at, activated_at)），所有这些都在一行中（列名无关紧要）。这有可能吗？

次要问题：为了使事情进一步复杂化，我还需要一个将application_steps与steps连接起来的辅助查询，并且只获取具有特定step_type_id的步骤的数据。假设第一部分是可能的，我该如何将其扩展为有效过滤？

注意：效率在这里是关键 - 这是针对年度报告，相当于约2500 applications，生产中有70个steps和44,000 application_steps（不是大量数据），但是当连接被考虑在内时可能会很多。）

Answer 1

这应该是一个基本的“旋转”聚合：

select id,
       max(case when step_id = 1 then datediff(completed_at, activated_at) end) as step_1_days,
       max(case when step_id = 2 then datediff(completed_at, activated_at) end) as step_2_days,
       max(case when step_id = 3 then datediff(completed_at, activated_at) end) as step_3_days,
       max(case when step_id = 4 then datediff(completed_at, activated_at) end) as step_4_days
from application_steps s
group by id;

你必须在所有70个步骤中重复这一步。

仅针对特定类型的步骤执行此操作：

select application_id,
       max(case when step_id = 1 then datediff(completed_at, activated_at) end) as step_1_days,
       max(case when step_id = 2 then datediff(completed_at, activated_at) end) as step_2_days,
       max(case when step_id = 3 then datediff(completed_at, activated_at) end) as step_3_days,
       max(case when step_id = 4 then datediff(completed_at, activated_at) end) as step_4_days
from application_steps s join
     steps
     on s.step_id = steps.id and
        steps.step_type_id = XXX
group by application_id;

使用连接收集多列聚合数据

1 个答案: