我在PostgreSQL中有一个数据框,如下所示,我想要每个id的最新记录,如果每个id的任何最新记录在任何列中都包含NULL值,我想用同一列中的下一个最新值替换它列
数据
id ingdt code gender address
1 27-10-2018 NULL NULL street1
1 24-10-2018 1234 NULL street2
1 20-08-2017 3245 M street2
2 24-09-2018 NULL F Astreet
2 24-10-2018 2857 F Bstreet
3 24-08-2018 3489 M NULL
3 22-08-2018 5802 M Cstreet
预期输出
final_output
id ingdt code gender address
1 27-10-2018 1234 M street1
2 24-10-2018 2857 F Bstreet
3 24-08-2018 3489 M Cstreet
尝试
insert into final_output select * from (
(select code, id from data where code != null order by ingdt limit 1) x join
(select gender, id from data where gender != null order by ingdt limit 1) y join
(select address, id from data where address != null order by ingdt limit 1)z on y.id=x.id)
答案 0 :(得分:1)
使用window functions可以帮助您:
SELECT DISTINCT
id,
max(ingdt) OVER (PARTITION BY id),
first_value(code) OVER (PARTITION BY id ORDER BY code IS NULL, ingdt DESC) AS code,
first_value(gender) OVER (PARTITION BY id ORDER BY gender IS NULL, ingdt DESC) AS gender,
first_value(address) OVER (PARTITION BY id ORDER BY address IS NULL, ingdt DESC) AS address
FROM mytable
ORDER BY id
解释first_value(...) OVER (...)
:
窗口功能可以将您的行分为不同的框架。这是通过关键字PARTITION BY
完成的。在这种情况下,我将为每个id
生成帧。
现在,我正在检查列的值是否为NULL
。这给了我true
或false
。我正在像任何boolean
列一样,首先将false
(意味着NOT NULL
)排序此结果。如果有许多NOT NULL
行,则采用最新行(ingdt DESC
)。该排序也分别针对每个帧进行。
first_value()
计算排序帧的第一个值。