SQL从generate_series中选择,按user_id过滤删除系列?

时间:2013-05-20 19:28:46

标签: sql join aggregate-functions postgresql-9.2 generate-series

我使用generate_series和多个连接在postgresql 9.2.4中有一个复杂的(对我来说)SQL查询。我需要在练习表中总结特定日期所有练习的代表,并确保这些练习属于当前用户完成的锻炼。最后,我需要将该表连接到一个系列以显示缺少的日期(使用generate_series)。

我的想法是在from子句中选择系列,然后将系列连接到子查询,该子查询具有练习和训练表之间的内部联接的结果。例如,我有以下查询:

SELECT 
    DISTINCT date_trunc('day', series.date)::date as date,
    sum(COALESCE(reps, 0)) OVER WIN,
    array_agg(workout_id) OVER WIN as ids     
FROM (
    select generate_series(-22, 0) + current_date as date
) series 
LEFT JOIN (
    exercises INNER JOIN workouts 
    ON exercises.workout_id = workouts.id
) 
ON series.date = exercises.created_at::date 
WINDOW 
   WIN AS (PARTITION BY date_trunc('day', series.date)::date)
ORDER BY date ASC;

这给出了以下输出:

    date    | sum |                           ids                           
------------+-----+---------------------------------------------------------
 2013-04-27 |   0 | {NULL}
 2013-04-28 | 432 | {49,48,47,46,45,44,43,42,41,38,37,36,36,36,36,35,34,33}
 2013-04-29 |   0 | {NULL}
 2013-04-30 |  20 | {50}
 2013-05-01 |   0 | {NULL}
 2013-05-02 |   0 | {NULL}
 2013-05-03 |   0 | {NULL}
 2013-05-04 |   0 | {NULL}
 2013-05-05 |   0 | {NULL}
 2013-05-06 |   0 | {NULL}
 2013-05-07 |  40 | {51,51}
 2013-05-08 |   0 | {NULL}
 2013-05-09 |   0 | {NULL}
 2013-05-10 |   0 | {NULL}
 2013-05-11 |   0 | {NULL}
 2013-05-12 |   0 | {NULL}
 2013-05-13 |   0 | {NULL}
 2013-05-14 |   0 | {NULL}
 2013-05-15 |   0 | {NULL}
 2013-05-16 |  20 | {52}
 2013-05-17 |   0 | {NULL}
 2013-05-18 |   0 | {NULL}
 2013-05-19 |   0 | {NULL}
(23 rows)

但是,我想按某些条件进行过滤:

WHERE workouts.user_id = 5

例如。

但是如果我在上面的查询中使用该条件放置WHERE子句,则输出如下:

    date    | sum |                           ids                           
------------+-----+---------------------------------------------------------
 2013-04-28 | 432 | {49,48,47,46,45,44,43,42,41,38,37,36,36,36,36,35,34,33}
 2013-04-30 |  20 | {50}
 2013-05-07 |  40 | {51,51}
 2013-05-16 |  20 | {52}
(4 rows)

这个系列消失了。

如何按user_id过滤并保留系列?任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:2)

  

我有一个复杂的(对我而言)SQL查询...

确实,你做到了。但它不一定是那样

SELECT s.day
      ,COALESCE(sum(w.reps), 0) AS sum_reps  -- assuming reps comes from workouts
      ,array_agg(e.workout_id)  AS ids
FROM   exercises e
JOIN   workouts  w ON w.id = e.workout_id AND w.user_id = 5
RIGHT  JOIN (
   SELECT now()::date + generate_series(-22, 0) AS day
   ) s ON s.day = e.created_at::date 
GROUP  BY 1
ORDER  BY 1;

主要观点:

  • RIGHT [OUTER] JOINLEFT JOIN的反向双胞胎。由于连接是从左到右应用的,因此您不需要这样的括号。

  • 切勿使用基本类型和函数名称date作为标识符。我用day替代。

  • 更新:为了避免聚合/窗口函数sum()的结果中出现NULL,请使用 outer COALESCE,如下所示:COALESCE(sum(reps), 0))

    sum(COALESCE(reps, 0))
  • 您根本不需要date_trunc()。这是一个date开头:

    date_trunc('day', s.day)::date AS day
  • 在这种情况下,您只需使用简单的DISTINCT,而不是复杂且相对昂贵的组合od GROUP BY +窗口函数。

聚合函数和COALESCE()

最近在一些问题中对这一点感到困惑。

通常,sum()或其他汇总函数会忽略 NULL 值。结果就像价值根本不存在一样。但是,有一些特殊情况。 The manual advises:

  

应该注意除count外,这些函数返回a   没有选择行时为null。特别是,sum没有行   返回null,而不是像预期的那样为零,array_agg返回null   没有输入行时,而不是空数组。 coalesce   function可用于将零或空数组替换为null   必要时。

这个演示应该通过演示角落案例来澄清:

  • 1表没有行。
  • 包含1行的3个表格(NULL / 0 / 1
  • 包含2行NULL和(NULL / 0 / 1
  • 的3个表格

测试设置

-- no rows
CREATE TABLE t_empty (i int);
-- INSERT nothing

CREATE TABLE t_0 (i int);
CREATE TABLE t_1 (i int);
CREATE TABLE t_n (i int);

-- 1 row
INSERT INTO t_0 VALUES (0);
INSERT INTO t_1 VALUES (1);
INSERT INTO t_n VALUES (NULL);

CREATE TABLE t_0n (i int);
CREATE TABLE t_1n (i int);
CREATE TABLE t_nn (i int);

-- 2 rows
INSERT INTO t_0n VALUES (0),    (NULL);
INSERT INTO t_1n VALUES (1),    (NULL);
INSERT INTO t_nn VALUES (NULL), (NULL);

查询

SELECT 't_empty'           AS tbl
      ,count(*)            AS ct_all
      ,count(i)            AS ct_i
      ,sum(i)              AS simple_sum
      ,sum(COALESCE(i, 0)) AS inner_coalesce
      ,COALESCE(sum(i), 0) AS outer_coalesce
FROM   t_empty

UNION ALL
SELECT 't_0',  count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_0
UNION ALL
SELECT 't_1',  count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_1
UNION ALL
SELECT 't_n',  count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_n

UNION ALL
SELECT 't_0n', count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_0n
UNION ALL
SELECT 't_1n', count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_1n
UNION ALL
SELECT 't_nn', count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_nn;

结果

   tbl   | ct_all | ct_i | simple_sum | inner_coalesce | outer_coalesce
---------+--------+------+------------+----------------+----------------
 t_empty |      0 |    0 |     <NULL> |         <NULL> |              0
 t_0     |      1 |    1 |          0 |              0 |              0
 t_1     |      1 |    1 |          1 |              1 |              1
 t_n     |      1 |    0 |     <NULL> |              0 |              0
 t_0n    |      2 |    1 |          0 |              0 |              0
 t_1n    |      2 |    1 |          1 |              1 |              1
 t_nn    |      2 |    0 |     <NULL> |              0 |              0

-> SQLfiddle

Ergo,我最初的建议很草率。您可能需要 COALESCE sum() 但如果你这样做,请使用外部 COALESCE。原始查询中的内部COALESCE并未涵盖所有极端情况,并且很少有用。

答案 1 :(得分:1)

而不是从WORKOUTS表中获取所有数据,你可以把这个条件放在那里 -

SELECT 
    DISTINCT date_trunc('day', series.date)::date as date,
    sum(COALESCE(reps, 0)) OVER WIN,
    array_agg(workout_id) OVER WIN as ids     
FROM (
    select generate_series(-22, 0) + current_date as date
) series 
LEFT JOIN (
    exercises INNER JOIN (select * from workouts where user_id = 5) workout 
    ON exercises.workout_id = workouts.id
) 
ON series.date = exercises.created_at::date 
WINDOW 
   WIN AS (PARTITION BY date_trunc('day', series.date)::date)
ORDER BY date ASC;

我认为这应该可以为您提供所需的输出。