如果我有一个像:
这样的工作表|id|created_at |status |
----------------------------
|1 |01-01-2015 |error |
|2 |01-01-2015 |complete |
|3 |01-01-2015 |error |
|4 |01-02-2015 |complete |
|5 |01-02-2015 |complete |
|6 |01-03-2015 |error |
|7 |01-03-2015 |on hold |
|8 |01-03-2015 |complete |
我想要一个查询,按日期对它们进行分组,并计算每个状态的发生次数以及该日期的总状态。
SELECT created_at status, count(status), created_at
FROM jobs
GROUP BY created_at, status;
哪个给了我
|created_at |status |count|
-------------------------------
|01-01-2015 |error |2
|01-01-2015 |complete |1
|01-02-2015 |complete |2
|01-03-2015 |error |1
|01-03-2015 |on hold |1
|01-03-2015 |complete |1
我现在想要将每个created_at
唯一日期缩减为一行,每个status
都有一些多列布局。一个约束是status
是5个可能的单词中的任何一个,但每个日期可能没有每个状态中的一个。此外,我想了解每天的所有状态。所以期望的结果看起来像:
|date |total |errors|completed|on_hold|
----------------------------------------------
|01-01-2015 |3 |2 |1 |null
|01-02-2015 |2 |null |2 |null
|01-03-2015 |3 |1 |1 |1
列可以通过
之类的东西动态构建SELECT DISTINCT status FROM jobs;
对于任何不包含任何此类状态的日期,结果为null。我不是SQL专家,但我正在尝试在数据库视图中执行此操作,这样我就不必在Rails中执行多个查询。
我正在使用Postresql,但我想尝试保持它的直接SQL。我试图理解聚合函数足以使用其他一些工具但没有成功。
答案 0 :(得分:3)
以下内容适用于任何RDBMS:
SELECT created_at, count(status) AS total,
sum(case when status = 'error' then 1 end) as errors,
sum(case when status = 'complete' then 1 end) as completed,
sum(case when status = 'on hold' then 1 end) as on_hold
FROM jobs
GROUP BY created_at;
查询使用条件聚合以便 pivot 分组数据。它假设status
值在事先已知。如果您有其他status
值的情况,只需添加相应的sum(case ...
表达式。
答案 1 :(得分:2)
实际交叉表查询将如下所示:
SELECT * FROM crosstab(
$$SELECT created_at, status, count(*) AS ct
FROM jobs
GROUP BY 1, 2
ORDER BY 1, 2$$
,$$SELECT unnest('{error,complete,"on hold"}'::text[])$$)
AS ct (date date, errors int, completed int, on_hold int);
应该表现得很好。
基础:
上述内容尚未包括每个日期的总数 Postgres 9.5 引入了ROLLUP子句,这是完美的案例:
SELECT * FROM crosstab(
$$SELECT created_at, COALESCE(status, 'total'), ct
FROM (
SELECT created_at, status, count(*) AS ct
FROM jobs
GROUP BY created_at, ROLLUP(status)
) sub
ORDER BY 1, 2$$
,$$SELECT unnest('{total,error,complete,"on hold"}'::text[])$$)
AS ct (date date, total int, errors int, completed int, on_hold int);
直到Postgres 9.4 ,请改用此查询:
WITH cte AS (
SELECT created_at, status, count(*) AS ct
FROM jobs
GROUP BY 1, 2
)
TABLE cte
UNION ALL
SELECT created_at, 'total', sum(ct)
FROM cte
GROUP BY 1
ORDER BY 1
相关:
如果您想坚持简单查询,这会更短一点:
SELECT created_at
, count(*) AS total
, count(status = 'error' OR NULL) AS errors
, count(status = 'complete' OR NULL) AS completed
, count(status = 'on hold' OR NULL) AS on_hold
FROM jobs
GROUP BY 1;
每个日期的总数 count(status)
容易出错,因为它不计算status
中具有NULL值的行。请改用count(*)
,这也更短,速度更快。
以下是技术清单:
在Postgres 9.4 + 中使用新的聚合FILTER
子句like @a_horse mentioned:
SELECT created_at
, count(*) AS total
, count(*) FILTER (WHERE status = 'error') AS errors
, count(*) FILTER (WHERE status = 'complete') AS completed
, count(*) FILTER (WHERE status = 'on hold') AS on_hold
FROM jobs
GROUP BY 1;
详细说明: