检索行的每列的最后已知值

时间:2013-11-27 15:18:40

标签: sql postgresql null postgresql-9.2 window-functions

不确定要问这个问题的正确单词,所以我会将其分解。

我有一张表格如下:

date_time | a | b | c

最后4行:

15/10/2013 11:45:00 | null   | 'timtim' | 'fred'
15/10/2013 13:00:00 | 'tune' | 'reco'   | null
16/10/2013 12:00:00 | 'abc'  | null     | null
16/10/2013 13:00:00 | null   | 'died'   | null

我如何获取最后一条记录,但忽略空值,而是从前一条记录中获取值。

在我提供的示例中,返回的行将是

16/10/2013 13:00:00 | 'abc' | 'died' | 'fred'

正如您可以看到列的值是否为null,然后它将转到最后一条记录,该记录具有该列的值并使用该值。

这应该是可能的,我只是想不通。到目前为止,我只提出了:

select 
    last_value(a) over w a
from test
WINDOW w AS (
    partition by a
    ORDER BY ts asc
    range between current row and unbounded following
    );

但这仅适用于单一栏目......

3 个答案:

答案 0 :(得分:1)

这里我创建了一个聚合函数,用于将列收集到数组中。然后只需要删除NULL并从每个数组中选择最后一个元素。

示例数据

CREATE TABLE T (
    date_time timestamp,
    a text,
    b text,
    c text
);

INSERT INTO T VALUES ('2013-10-15 11:45:00', NULL, 'timtim', 'fred'),
('2013-10-15 13:00:00', 'tune', 'reco', NULL  ),
('2013-10-16 12:00:00', 'abc', NULL, NULL     ),
('2013-10-16 13:00:00', NULL, 'died', NULL    );

<强>解决方案

CREATE AGGREGATE array_accum (anyelement)
(
    sfunc = array_append,
    stype = anyarray,
    initcond = '{}'
);

WITH latest_nonull AS (
    SELECT MAX(date_time) As MaxDateTime, 
           array_remove(array_accum(a), NULL) AS A, 
           array_remove(array_accum(b), NULL) AS B, 
           array_remove(array_accum(c), NULL) AS C
    FROM T
    ORDER BY date_time
)
SELECT MaxDateTime, A[array_upper(A, 1)], B[array_upper(B,1)], C[array_upper(C,1)]
FROM latest_nonull;

<强>结果

     maxdatetime     |  a  |  b   |  c
---------------------+-----+------+------
 2013-10-16 13:00:00 | abc | died | fred
(1 row)

答案 1 :(得分:1)

行数

需要明确定义“最后一行”和排序顺序。集合(或表格)中没有自然顺序。我假设ORDER BY ts,其中ts是时间戳列 与@Jorge pointed out in his comment类似:如果ts不是UNIQUE,则需要为排序顺序定义tiebreakers以使其明确(向ORDER BY添加更多项目)。主键是最终的解决方案。

具有窗口函数的通用解决方案,用于获取表中每个行的结果

SELECT ts
      ,max(a) OVER (PARTITION BY grp_a) AS a
      ,max(b) OVER (PARTITION BY grp_b) AS b
      ,max(c) OVER (PARTITION BY grp_c) AS c
FROM (
   SELECT *
         ,count(a) OVER (ORDER BY ts) AS grp_a
         ,count(b) OVER (ORDER BY ts) AS grp_b
         ,count(c) OVER (ORDER BY ts) AS grp_c
   FROM t
   ) sub;

如何?

聚合函数count()在计数时忽略NULL值。用作聚合窗函数,它根据default window definition, which is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW计算列的运行计数。这会导致对具有NULL值的行“卡住”计数,从而形成应该共享相同(非空)值的对等组。
 在第二个窗口函数中,每个组的唯一非空值可以使用max()轻松提取。

只是最后一行

WITH cte AS (
   SELECT *
         ,count(a) OVER w AS grp_a
         ,count(b) OVER w AS grp_b
         ,count(c) OVER w AS grp_c
   FROM   t
   WINDOW w AS (ORDER BY ts)
   ) 
SELECT ts
      ,max(a) OVER (PARTITION BY grp_a) AS a
      ,max(b) OVER (PARTITION BY grp_b) AS b
      ,max(c) OVER (PARTITION BY grp_c) AS c
FROM   cte
ORDER  BY ts DESC
LIMIT  1;

最后一行的简单替代

SELECT ts
      ,COALESCE(a, (SELECT a FROM t WHERE a IS NOT NULL ORDER BY ts DESC LIMIT 1)) AS a
      ,COALESCE(b, (SELECT b FROM t WHERE b IS NOT NULL ORDER BY ts DESC LIMIT 1)) AS b
      ,COALESCE(c, (SELECT c FROM t WHERE c IS NOT NULL ORDER BY ts DESC LIMIT 1)) AS c
FROM   t
ORDER  BY ts DESC
LIMIT  1;


SELECT (SELECT ts FROM t                     ORDER BY ts DESC LIMIT 1) AS ts
      ,(SELECT a  FROM t WHERE a IS NOT NULL ORDER BY ts DESC LIMIT 1) AS a
      ,(SELECT b  FROM t WHERE b IS NOT NULL ORDER BY ts DESC LIMIT 1) AS b
      ,(SELECT c  FROM t WHERE c IS NOT NULL ORDER BY ts DESC LIMIT 1) AS c

-> SQLfiddle

性能

虽然这应该速度相当快,但如果性能是您的首要要求,我会使用plpgsql函数。从最后一行开始并循环下降,直到您为每个所需的列都有一个非null值。沿着这些路线:
GROUP BY and aggregate sequential numeric values

答案 2 :(得分:0)

这应该有用,但请记住这是一个很好的解决方案

select * from
(select dt from
(select rank() over (order by ctid desc) idx, dt
  from sometable ) cx
where idx = 1) dtz,
(
select a from
(select rank() over (order by ctid desc) idx, a
  from sometable where a is not null ) ax 
where idx = 1) az,
(
select b from
(select rank() over (order by ctid desc) idx, b
  from sometable where b is not null ) bx 
where idx = 1) bz,
(
select c from
(select rank() over (order by ctid desc) idx, c
  from sometable where c is not null ) cx
where idx = 1) cz

在小提琴中查看:http://sqlfiddle.com/#!15/d5940/40

结果将是

DT                                   A        B      C
October, 16 2013 00:00:00+0000      abc     died    fred