不确定要问这个问题的正确单词,所以我会将其分解。
我有一张表格如下:
date_time | a | b | c
最后4行:
15/10/2013 11:45:00 | null | 'timtim' | 'fred'
15/10/2013 13:00:00 | 'tune' | 'reco' | null
16/10/2013 12:00:00 | 'abc' | null | null
16/10/2013 13:00:00 | null | 'died' | null
我如何获取最后一条记录,但忽略空值,而是从前一条记录中获取值。
在我提供的示例中,返回的行将是
16/10/2013 13:00:00 | 'abc' | 'died' | 'fred'
正如您可以看到列的值是否为null,然后它将转到最后一条记录,该记录具有该列的值并使用该值。
这应该是可能的,我只是想不通。到目前为止,我只提出了:
select
last_value(a) over w a
from test
WINDOW w AS (
partition by a
ORDER BY ts asc
range between current row and unbounded following
);
但这仅适用于单一栏目......
答案 0 :(得分:1)
这里我创建了一个聚合函数,用于将列收集到数组中。然后只需要删除NULL并从每个数组中选择最后一个元素。
示例数据
CREATE TABLE T (
date_time timestamp,
a text,
b text,
c text
);
INSERT INTO T VALUES ('2013-10-15 11:45:00', NULL, 'timtim', 'fred'),
('2013-10-15 13:00:00', 'tune', 'reco', NULL ),
('2013-10-16 12:00:00', 'abc', NULL, NULL ),
('2013-10-16 13:00:00', NULL, 'died', NULL );
<强>解决方案强>
CREATE AGGREGATE array_accum (anyelement)
(
sfunc = array_append,
stype = anyarray,
initcond = '{}'
);
WITH latest_nonull AS (
SELECT MAX(date_time) As MaxDateTime,
array_remove(array_accum(a), NULL) AS A,
array_remove(array_accum(b), NULL) AS B,
array_remove(array_accum(c), NULL) AS C
FROM T
ORDER BY date_time
)
SELECT MaxDateTime, A[array_upper(A, 1)], B[array_upper(B,1)], C[array_upper(C,1)]
FROM latest_nonull;
<强>结果强>
maxdatetime | a | b | c
---------------------+-----+------+------
2013-10-16 13:00:00 | abc | died | fred
(1 row)
答案 1 :(得分:1)
需要明确定义“最后一行”和排序顺序。集合(或表格)中没有自然顺序。我假设ORDER BY ts
,其中ts是时间戳列
与@Jorge pointed out in his comment类似:如果ts
不是UNIQUE
,则需要为排序顺序定义tiebreakers以使其明确(向ORDER BY
添加更多项目)。主键是最终的解决方案。
SELECT ts
,max(a) OVER (PARTITION BY grp_a) AS a
,max(b) OVER (PARTITION BY grp_b) AS b
,max(c) OVER (PARTITION BY grp_c) AS c
FROM (
SELECT *
,count(a) OVER (ORDER BY ts) AS grp_a
,count(b) OVER (ORDER BY ts) AS grp_b
,count(c) OVER (ORDER BY ts) AS grp_c
FROM t
) sub;
聚合函数count()
在计数时忽略NULL值。用作聚合窗函数,它根据default window definition, which is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
计算列的运行计数。这会导致对具有NULL值的行“卡住”计数,从而形成应该共享相同(非空)值的对等组。
在第二个窗口函数中,每个组的唯一非空值可以使用max()
轻松提取。
WITH cte AS (
SELECT *
,count(a) OVER w AS grp_a
,count(b) OVER w AS grp_b
,count(c) OVER w AS grp_c
FROM t
WINDOW w AS (ORDER BY ts)
)
SELECT ts
,max(a) OVER (PARTITION BY grp_a) AS a
,max(b) OVER (PARTITION BY grp_b) AS b
,max(c) OVER (PARTITION BY grp_c) AS c
FROM cte
ORDER BY ts DESC
LIMIT 1;
SELECT ts
,COALESCE(a, (SELECT a FROM t WHERE a IS NOT NULL ORDER BY ts DESC LIMIT 1)) AS a
,COALESCE(b, (SELECT b FROM t WHERE b IS NOT NULL ORDER BY ts DESC LIMIT 1)) AS b
,COALESCE(c, (SELECT c FROM t WHERE c IS NOT NULL ORDER BY ts DESC LIMIT 1)) AS c
FROM t
ORDER BY ts DESC
LIMIT 1;
SELECT (SELECT ts FROM t ORDER BY ts DESC LIMIT 1) AS ts
,(SELECT a FROM t WHERE a IS NOT NULL ORDER BY ts DESC LIMIT 1) AS a
,(SELECT b FROM t WHERE b IS NOT NULL ORDER BY ts DESC LIMIT 1) AS b
,(SELECT c FROM t WHERE c IS NOT NULL ORDER BY ts DESC LIMIT 1) AS c
虽然这应该速度相当快,但如果性能是您的首要要求,我会使用plpgsql函数。从最后一行开始并循环下降,直到您为每个所需的列都有一个非null值。沿着这些路线:
GROUP BY and aggregate sequential numeric values
答案 2 :(得分:0)
这应该有用,但请记住这是一个很好的解决方案
select * from
(select dt from
(select rank() over (order by ctid desc) idx, dt
from sometable ) cx
where idx = 1) dtz,
(
select a from
(select rank() over (order by ctid desc) idx, a
from sometable where a is not null ) ax
where idx = 1) az,
(
select b from
(select rank() over (order by ctid desc) idx, b
from sometable where b is not null ) bx
where idx = 1) bz,
(
select c from
(select rank() over (order by ctid desc) idx, c
from sometable where c is not null ) cx
where idx = 1) cz
在小提琴中查看:http://sqlfiddle.com/#!15/d5940/40
结果将是
DT A B C
October, 16 2013 00:00:00+0000 abc died fred