我使用generate_series和多个连接在postgresql 9.2.4中有一个复杂的(对我来说)SQL查询。我需要在练习表中总结特定日期所有练习的代表,并确保这些练习属于当前用户完成的锻炼。最后,我需要将该表连接到一个系列以显示缺少的日期(使用generate_series)。
我的想法是在from子句中选择系列,然后将系列连接到子查询,该子查询具有练习和训练表之间的内部联接的结果。例如,我有以下查询:
SELECT
DISTINCT date_trunc('day', series.date)::date as date,
sum(COALESCE(reps, 0)) OVER WIN,
array_agg(workout_id) OVER WIN as ids
FROM (
select generate_series(-22, 0) + current_date as date
) series
LEFT JOIN (
exercises INNER JOIN workouts
ON exercises.workout_id = workouts.id
)
ON series.date = exercises.created_at::date
WINDOW
WIN AS (PARTITION BY date_trunc('day', series.date)::date)
ORDER BY date ASC;
这给出了以下输出:
date | sum | ids
------------+-----+---------------------------------------------------------
2013-04-27 | 0 | {NULL}
2013-04-28 | 432 | {49,48,47,46,45,44,43,42,41,38,37,36,36,36,36,35,34,33}
2013-04-29 | 0 | {NULL}
2013-04-30 | 20 | {50}
2013-05-01 | 0 | {NULL}
2013-05-02 | 0 | {NULL}
2013-05-03 | 0 | {NULL}
2013-05-04 | 0 | {NULL}
2013-05-05 | 0 | {NULL}
2013-05-06 | 0 | {NULL}
2013-05-07 | 40 | {51,51}
2013-05-08 | 0 | {NULL}
2013-05-09 | 0 | {NULL}
2013-05-10 | 0 | {NULL}
2013-05-11 | 0 | {NULL}
2013-05-12 | 0 | {NULL}
2013-05-13 | 0 | {NULL}
2013-05-14 | 0 | {NULL}
2013-05-15 | 0 | {NULL}
2013-05-16 | 20 | {52}
2013-05-17 | 0 | {NULL}
2013-05-18 | 0 | {NULL}
2013-05-19 | 0 | {NULL}
(23 rows)
但是,我想按某些条件进行过滤:
WHERE workouts.user_id = 5
例如。
但是如果我在上面的查询中使用该条件放置WHERE子句,则输出如下:
date | sum | ids
------------+-----+---------------------------------------------------------
2013-04-28 | 432 | {49,48,47,46,45,44,43,42,41,38,37,36,36,36,36,35,34,33}
2013-04-30 | 20 | {50}
2013-05-07 | 40 | {51,51}
2013-05-16 | 20 | {52}
(4 rows)
这个系列消失了。
如何按user_id过滤并保留系列?任何帮助将不胜感激。
答案 0 :(得分:2)
我有一个复杂的(对我而言)SQL查询...
确实,你做到了。但它不一定是那样:
SELECT s.day
,COALESCE(sum(w.reps), 0) AS sum_reps -- assuming reps comes from workouts
,array_agg(e.workout_id) AS ids
FROM exercises e
JOIN workouts w ON w.id = e.workout_id AND w.user_id = 5
RIGHT JOIN (
SELECT now()::date + generate_series(-22, 0) AS day
) s ON s.day = e.created_at::date
GROUP BY 1
ORDER BY 1;
RIGHT [OUTER] JOIN
是LEFT JOIN
的反向双胞胎。由于连接是从左到右应用的,因此您不需要这样的括号。
切勿使用基本类型和函数名称date
作为标识符。我用day
替代。
更新:为了避免聚合/窗口函数sum()
的结果中出现NULL,请使用 outer COALESCE
,如下所示:COALESCE(sum(reps), 0))
sum(COALESCE(reps, 0))
您根本不需要date_trunc()
。这是一个date
开头:
date_trunc('day', s.day)::date AS day
在这种情况下,您只需使用简单的DISTINCT
,而不是复杂且相对昂贵的组合od GROUP BY
+窗口函数。
COALESCE()
最近在一些问题中对这一点感到困惑。
通常,sum()
或其他汇总函数会忽略 NULL
值。结果就像价值根本不存在一样。但是,有一些特殊情况。 The manual advises:
应该注意除
count
外,这些函数返回a 没有选择行时为null。特别是,sum
没有行 返回null,而不是像预期的那样为零,array_agg
返回null 没有输入行时,而不是空数组。coalesce
function可用于将零或空数组替换为null 必要时。
这个演示应该通过演示角落案例来澄清:
NULL
/ 0
/ 1
)NULL
和(NULL
/ 0
/ 1
)-- no rows
CREATE TABLE t_empty (i int);
-- INSERT nothing
CREATE TABLE t_0 (i int);
CREATE TABLE t_1 (i int);
CREATE TABLE t_n (i int);
-- 1 row
INSERT INTO t_0 VALUES (0);
INSERT INTO t_1 VALUES (1);
INSERT INTO t_n VALUES (NULL);
CREATE TABLE t_0n (i int);
CREATE TABLE t_1n (i int);
CREATE TABLE t_nn (i int);
-- 2 rows
INSERT INTO t_0n VALUES (0), (NULL);
INSERT INTO t_1n VALUES (1), (NULL);
INSERT INTO t_nn VALUES (NULL), (NULL);
SELECT 't_empty' AS tbl
,count(*) AS ct_all
,count(i) AS ct_i
,sum(i) AS simple_sum
,sum(COALESCE(i, 0)) AS inner_coalesce
,COALESCE(sum(i), 0) AS outer_coalesce
FROM t_empty
UNION ALL
SELECT 't_0', count(*), count(i)
,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_0
UNION ALL
SELECT 't_1', count(*), count(i)
,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_1
UNION ALL
SELECT 't_n', count(*), count(i)
,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_n
UNION ALL
SELECT 't_0n', count(*), count(i)
,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_0n
UNION ALL
SELECT 't_1n', count(*), count(i)
,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_1n
UNION ALL
SELECT 't_nn', count(*), count(i)
,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_nn;
tbl | ct_all | ct_i | simple_sum | inner_coalesce | outer_coalesce
---------+--------+------+------------+----------------+----------------
t_empty | 0 | 0 | <NULL> | <NULL> | 0
t_0 | 1 | 1 | 0 | 0 | 0
t_1 | 1 | 1 | 1 | 1 | 1
t_n | 1 | 0 | <NULL> | 0 | 0
t_0n | 2 | 1 | 0 | 0 | 0
t_1n | 2 | 1 | 1 | 1 | 1
t_nn | 2 | 0 | <NULL> | 0 | 0
Ergo,我最初的建议很草率。您可能需要 COALESCE
sum()
但如果你这样做,请使用外部 COALESCE
。原始查询中的内部COALESCE
并未涵盖所有极端情况,并且很少有用。
答案 1 :(得分:1)
而不是从WORKOUTS表中获取所有数据,你可以把这个条件放在那里 -
SELECT
DISTINCT date_trunc('day', series.date)::date as date,
sum(COALESCE(reps, 0)) OVER WIN,
array_agg(workout_id) OVER WIN as ids
FROM (
select generate_series(-22, 0) + current_date as date
) series
LEFT JOIN (
exercises INNER JOIN (select * from workouts where user_id = 5) workout
ON exercises.workout_id = workouts.id
)
ON series.date = exercises.created_at::date
WINDOW
WIN AS (PARTITION BY date_trunc('day', series.date)::date)
ORDER BY date ASC;
我认为这应该可以为您提供所需的输出。