我正在尝试测量第一次调查记录中的净变化与同一调查中同一问题的最新记录,因为后来的调查通常是不完整的,因此有空值。
调查同一参与者的答案:
┌─────────────────────────────────────┐
| p_id | rank | val_a | val_b | val_c |
| 2 | 1 | 1 | 2 | 3 |
| 2 | 2 | 2 | | |
| 2 | 3 | 4 | 4 | 1 |
| 2 | 4 | 4 | 3 | |
└─────────────────────────────────────┘
Desired output:
┌──────────────────────────────┐
| p_id | val_a | val_b | val_c |
| 2 | 3 | 1 | -2 |
└──────────────────────────────┘
a = row4 - row1
b = row4 - row1
c = row3 - row1 (uses the rank3 value since rank4 has none)
结果应显示排名最高的行(非空)的列值与第一行中永远不应为空的同一列的值之间的差异。
到目前为止,我已经获得了两行之间差异的代码,但是如果在较低级别中存在可以使用的非空值,则无法弄清楚如何考虑空值。
SELECT
p_id,
CASE WHEN ("p_val_a" IS NOT null) AND (rank != 1) AND ("val_a" IS NOT null) THEN "val_a" - "p_val_a" ELSE NULL END as "diff_val_a",
CASE WHEN ("p_val_b" IS NOT null) AND (rank != 1) AND ("val_b" IS NOT null) THEN "val_b" - "p_val_b" ELSE NULL END AS "diff_val_b",
CASE WHEN ("p_val_c" IS NOT null) AND (rank != 1) AND ("val_c" IS NOT null) THEN "val_c" - "p_val_c" ELSE NULL END AS "diff_val_c"
FROM
(
SELECT
p_id,
"val_a",
"val_b",
"val_c",
LAG("val_a") OVER w AS "p_val_a",
LAG("val_b") OVER w as "p_val_b",
LAG("val_c") OVER w as "p_val_c"
FROM
dataset WINDOW w AS (
PARTITION BY
p_id
ORDER BY
rank
)
) t;
在上面的示例中,如果仅查询第一行和最后一行,则val_a和val_b会生成正确的结果。但val_c会产生null而不是-2。
如何将第一行值与最新非空值的行中同一列的值进行比较?
答案 0 :(得分:1)
分析函数可用于找出哪一行(排名)第一次出现非NULL值,并再次查找最后一次出现。
然后条件聚合可以挑选出这些值。
http://sqlfiddle.com/#!17/78886/9
WITH
analysed
AS
(
SELECT
*,
MIN(CASE WHEN val_a IS NOT NULL THEN rank END) OVER ranked_pid AS first_a_pos,
MIN(CASE WHEN val_b IS NOT NULL THEN rank END) OVER ranked_pid AS first_b_pos,
MIN(CASE WHEN val_c IS NOT NULL THEN rank END) OVER ranked_pid AS first_c_pos,
MAX(CASE WHEN val_a IS NOT NULL THEN rank END) OVER ranked_pid AS final_a_pos,
MAX(CASE WHEN val_b IS NOT NULL THEN rank END) OVER ranked_pid AS final_b_pos,
MAX(CASE WHEN val_c IS NOT NULL THEN rank END) OVER ranked_pid AS final_c_pos
FROM
test
WINDOW
ranked_pid AS (
PARTITION BY p_id
-- ORDER BY rank
-- ROWS BETWEEN unbounded preceding
-- AND unbounded following
)
)
SELECT
p_id,
MAX(CASE WHEN rank = final_a_pos THEN val_a END) - MAX(CASE WHEN rank = first_a_pos THEN val_a END) AS change_in_a,
MAX(CASE WHEN rank = final_b_pos THEN val_b END) - MAX(CASE WHEN rank = first_b_pos THEN val_b END) AS change_in_b,
MAX(CASE WHEN rank = final_c_pos THEN val_c END) - MAX(CASE WHEN rank = first_c_pos THEN val_c END) AS change_in_c
FROM
analysed
GROUP BY
p_id
ORDER BY
p_id
<强> 编辑: 强>
注释掉了不需要的窗口定义的一部分。当我在玩FIRST_VALUE()
和LAST_VALUE()
时,它就在那里(但是postgreSQL不支持IGNORE NULLS
)
答案 1 :(得分:1)
我会使用first_value()
和last_value()
:
select distinct p_id,
(first_value(val_a) over (partition by p_id order by (val_a is not null)::int desc, rank desc) -
first_value(val_a) over (partition by p_id order by (val_a is not null)::int desc, rank asc)
) as a_diff,
(first_value(val_b) over (partition by p_id order by (val_b is not null)::int desc, rank desc) -
first_value(val_b) over (partition by p_id order by (val_b is not null)::int desc, rank asc)
) as b_diff,
(first_value(val_c) over (partition by p_id order by (val_c is not null)::int desc, rank desc) -
first_value(val_c) over (partition by p_id order by (val_c is not null)::int desc, rank asc)
) as c_diff
from t;
Here是一个SQL小提琴。