null

时间:2018-05-23 15:56:52

标签: sql postgresql

我正在尝试测量第一次调查记录中的净变化与同一调查中同一问题的最新记录,因为后来的调查通常是不完整的,因此有空值。

调查同一参与者的答案:

┌─────────────────────────────────────┐
| p_id | rank | val_a | val_b | val_c |
|    2 |    1 |     1 |     2 |     3 |
|    2 |    2 |     2 |       |       |
|    2 |    3 |     4 |     4 |     1 |
|    2 |    4 |     4 |     3 |       |
└─────────────────────────────────────┘

Desired output:
┌──────────────────────────────┐
| p_id | val_a | val_b | val_c |
|    2 | 3     | 1     |    -2 |
└──────────────────────────────┘

a = row4 - row1
b = row4 - row1 
c = row3 - row1 (uses the rank3 value since rank4 has none)

结果应显示排名最高的行(非空)的列值与第一行中永远不应为空的同一列的值之间的差异。

到目前为止,我已经获得了两行之间差异的代码,但是如果在较低级别中存在可以使用的非空值,则无法弄清楚如何考虑空值。

SELECT
    p_id,
    CASE WHEN ("p_val_a" IS NOT null) AND (rank != 1) AND ("val_a" IS NOT null) THEN "val_a" - "p_val_a" ELSE NULL END as "diff_val_a",
    CASE WHEN ("p_val_b" IS NOT null) AND (rank != 1) AND ("val_b" IS NOT null) THEN "val_b" - "p_val_b" ELSE NULL END AS "diff_val_b",
    CASE WHEN ("p_val_c" IS NOT null) AND (rank != 1) AND ("val_c" IS NOT null) THEN "val_c" - "p_val_c" ELSE NULL END AS "diff_val_c"
FROM
    (
        SELECT
            p_id,
            "val_a",
            "val_b",
            "val_c",
            LAG("val_a") OVER w AS "p_val_a",
            LAG("val_b") OVER w as "p_val_b",
            LAG("val_c") OVER w as "p_val_c"
        FROM
            dataset WINDOW w AS (
                PARTITION BY 
                    p_id
                ORDER BY
                    rank
            )
    ) t;

在上面的示例中,如果仅查询第一行和最后一行,则val_a和val_b会生成正确的结果。但val_c会产生null而不是-2。

如何将第一行值与最新非空值的行中同一列的值进行比较?

2 个答案:

答案 0 :(得分:1)

分析函数可用于找出哪一行(排名)第一次出现非NULL值,并再次查找最后一次出现。

然后条件聚合可以挑选出这些值。

http://sqlfiddle.com/#!17/78886/9

WITH
  analysed
AS
(
  SELECT
    *,
    MIN(CASE WHEN val_a IS NOT NULL THEN rank END) OVER ranked_pid   AS first_a_pos,
    MIN(CASE WHEN val_b IS NOT NULL THEN rank END) OVER ranked_pid   AS first_b_pos,
    MIN(CASE WHEN val_c IS NOT NULL THEN rank END) OVER ranked_pid   AS first_c_pos,
    MAX(CASE WHEN val_a IS NOT NULL THEN rank END) OVER ranked_pid   AS final_a_pos,
    MAX(CASE WHEN val_b IS NOT NULL THEN rank END) OVER ranked_pid   AS final_b_pos,
    MAX(CASE WHEN val_c IS NOT NULL THEN rank END) OVER ranked_pid   AS final_c_pos
  FROM
    test
  WINDOW
    ranked_pid AS (
      PARTITION BY p_id
  --      ORDER BY rank
  --  ROWS BETWEEN unbounded preceding
  --           AND unbounded following
    )
)
SELECT
  p_id,
  MAX(CASE WHEN rank = final_a_pos THEN val_a END) - MAX(CASE WHEN rank = first_a_pos THEN val_a END)  AS change_in_a,
  MAX(CASE WHEN rank = final_b_pos THEN val_b END) - MAX(CASE WHEN rank = first_b_pos THEN val_b END)  AS change_in_b,
  MAX(CASE WHEN rank = final_c_pos THEN val_c END) - MAX(CASE WHEN rank = first_c_pos THEN val_c END)  AS change_in_c
FROM
  analysed
GROUP BY
  p_id
ORDER BY
  p_id

<强> 编辑:

注释掉了不需要的窗口定义的一部分。当我在玩FIRST_VALUE()LAST_VALUE() 时,它就在那里(但是postgreSQL不支持IGNORE NULLS

答案 1 :(得分:1)

我会使用first_value()last_value()

select distinct p_id,
       (first_value(val_a) over (partition by p_id order by (val_a is not null)::int desc, rank desc) -
        first_value(val_a) over (partition by p_id order by (val_a is not null)::int desc, rank asc)
       ) as a_diff,
       (first_value(val_b) over (partition by p_id order by (val_b is not null)::int desc, rank desc) -
        first_value(val_b) over (partition by p_id order by (val_b is not null)::int desc, rank asc)
       ) as b_diff,
       (first_value(val_c) over (partition by p_id order by (val_c is not null)::int desc, rank desc) -
        first_value(val_c) over (partition by p_id order by (val_c is not null)::int desc, rank asc)
       ) as c_diff
from t;

Here是一个SQL小提琴。