SQL查询将分组结果作为单行返回

时间:2015-12-12 19:32:01

标签: sql postgresql pivot crosstab

如果我有一个像:

这样的工作表
|id|created_at  |status    |
----------------------------
|1 |01-01-2015  |error     |
|2 |01-01-2015  |complete  |
|3 |01-01-2015  |error     |
|4 |01-02-2015  |complete  |
|5 |01-02-2015  |complete  |
|6 |01-03-2015  |error     |
|7 |01-03-2015  |on hold   |
|8 |01-03-2015  |complete  |

我想要一个查询,按日期对它们进行分组,并计算每个状态的发生次数以及该日期的总状态。

SELECT created_at status, count(status), created_at 
FROM jobs 
GROUP BY created_at, status;

哪个给了我

|created_at  |status    |count|
-------------------------------
|01-01-2015  |error     |2
|01-01-2015  |complete  |1
|01-02-2015  |complete  |2
|01-03-2015  |error     |1
|01-03-2015  |on hold   |1
|01-03-2015  |complete  |1   

我现在想要将每个created_at唯一日期缩减为一行,每个status都有一些多列布局。一个约束是status是5个可能的单词中的任何一个,但每个日期可能没有每个状态中的一个。此外,我想了解每天的所有状态。所以期望的结果看起来像:

|date        |total |errors|completed|on_hold|
----------------------------------------------
|01-01-2015  |3     |2     |1        |null   
|01-02-2015  |2     |null  |2        |null
|01-03-2015  |3     |1     |1        |1

列可以通过

之类的东西动态构建
SELECT DISTINCT status FROM jobs;

对于任何不包含任何此类状态的日期,结果为null。我不是SQL专家,但我正在尝试在数据库视图中执行此操作,这样我就不必在Rails中执行多个查询。

我正在使用Postresql,但我想尝试保持它的直接SQL。我试图理解聚合函数足以使用其他一些工具但没有成功。

2 个答案:

答案 0 :(得分:3)

以下内容适用于任何RDBMS:

SELECT created_at, count(status) AS total,
       sum(case when status = 'error' then 1 end) as errors,
       sum(case when status = 'complete' then 1 end) as completed,
       sum(case when status = 'on hold' then 1 end) as on_hold
FROM jobs 
GROUP BY created_at;

查询使用条件聚合以便 pivot 分组数据。它假设status值在事先已知。如果您有其他status值的情况,只需添加相应的sum(case ...表达式。

Demo here

答案 1 :(得分:2)

实际交叉表查询将如下所示:

SELECT * FROM crosstab(
   $$SELECT created_at, status, count(*) AS ct
     FROM   jobs 
     GROUP  BY 1, 2
     ORDER  BY 1, 2$$

  ,$$SELECT unnest('{error,complete,"on hold"}'::text[])$$)
AS ct (date date, errors int, completed int, on_hold int);

应该表现得很好。

基础:

上述内容尚未包括每个日期的总数 Postgres 9.5 引入了ROLLUP子句,这是完美的案例:

SELECT * FROM crosstab(
 $$SELECT created_at, COALESCE(status, 'total'), ct
   FROM  (
      SELECT created_at, status, count(*) AS ct
      FROM   jobs 
      GROUP  BY created_at, ROLLUP(status)
      ) sub
   ORDER  BY 1, 2$$

  ,$$SELECT unnest('{total,error,complete,"on hold"}'::text[])$$)
AS ct (date date, total int, errors int, completed int, on_hold int);

直到Postgres 9.4 ,请改用此查询:

WITH cte AS (
    SELECT created_at, status, count(*) AS ct
    FROM   jobs 
    GROUP  BY 1, 2
    )
TABLE  cte
UNION  ALL
SELECT created_at, 'total', sum(ct)
FROM   cte 
GROUP  BY 1
ORDER  BY 1

相关:

如果您想坚持简单查询,这会更短一点:

SELECT created_at
     , count(*) AS total
     , count(status = 'error' OR NULL)    AS errors
     , count(status = 'complete' OR NULL) AS completed
     , count(status = 'on hold' OR NULL)  AS on_hold
FROM   jobs 
GROUP  BY 1;
每个日期的总数

count(status)容易出错,因为它不计算status中具有NULL值的行。请改用count(*),这也更短,速度更快。

以下是技术清单:

在Postgres 9.4 + 中使用新的聚合FILTER子句like @a_horse mentioned

SELECT created_at
     , count(*) AS total
     , count(*) FILTER (WHERE status = 'error')    AS errors
     , count(*) FILTER (WHERE status = 'complete') AS completed
     , count(*) FILTER (WHERE status = 'on hold')  AS on_hold
FROM   jobs 
GROUP  BY 1;

详细说明: