在PostgreSQL中实现了最后一次观察(LOCF)吗?

时间:2014-12-09 19:33:20

标签: postgresql missing-data

PostgreSQL中是否实现了数据插补方法Last Observation Carried Forward(LOCF)?如果没有,我该如何实现这种方法?

2 个答案:

答案 0 :(得分:0)

我将此表和数据直接基于链接文章中的表格。

create table test (
  unit integer not null
    check (unit >= 1),
  obs_time integer not null
    check (obs_time >= 1),
  obs_value numeric(5, 1),
  primary key (unit, obs_time)
);

insert into test values
(1, 1, 3.8), (1, 2, 3.1), (1, 3, 2.0),
(2, 1, 4.1), (2, 2, 3.5), (2, 3, 3.8), (2, 4, 2.4), (2, 5, 2.8), (2, 6, 3.0),
(3, 1, 2.7), (3, 2, 2.4), (3, 3, 2.9), (3, 4, 3.5);

对于链接文章中的六个观察,我们需要“unit”和“obs_time”的所有可能组合。

select distinct unit, times.obs_time 
from test
cross join (select generate_series(1, 6) obs_time) times;
unit  obs_time
--
1     1
1     2
1     3
1     4
1     5
1     6
2     1
. . .
3     6

我们还需要知道每个单元中哪一行具有最后观察到的值。

select unit, max(obs_time) obs_time
from test
group by unit
order by unit;
unit  obs_time
--
1     3
2     6
3     4

知道这两组,我们可以加入并合并以获得最后一次观察并继续前进。

with unit_times as (
  select distinct unit, times.obs_time 
  from test
  cross join (select generate_series(1, 6) obs_time) times
), last_obs_time as (
  select unit, max(obs_time) obs_time
  from test
  group by unit
)
select t1.unit, t1.obs_time, 
       coalesce(t2.obs_value, (select obs_value 
                               from test 
                               inner join last_obs_time 
                                  on test.unit = last_obs_time.unit 
                                 and test.obs_time = last_obs_time.obs_time 
                               where test.unit = t1.unit)) obs_value
from unit_times t1
left join test t2 
       on t1.unit = t2.unit and t1.obs_time = t2.obs_time
order by t1.unit, t1.obs_time;
unit obs_time  obs_value
--
1    1         3.8
1    2         3.1
1    3         2.0
1    4         2.0
1    5         2.0
1    6         2.0
2    1         4.1
. . . 
3    4         3.5
3    5         3.5
3    6         3.5

要获得与链接文章相同的视觉输出,请使用tablefunc module中的crosstab()函数。您也可以使用应用程序代码进行操作。

答案 1 :(得分:0)

以下代码假设一个表tbl,其中包含ab(键),t(时间)和v列(对于locf的值) ):

create or replace function locf_s(a float, b float)
returns float
language sql
as '
  select coalesce(b, a)
';

drop aggregate if exists locf(float);
CREATE AGGREGATE locf(FLOAT) (
  SFUNC = locf_s,
  STYPE = FLOAT
);

select a,b,t,v,
    locf(v) over (PARTITION by a,b ORDER by t) as v_locf
from tbl
order by a,b,t
;

SQLFiddle

有关教程:"LOCF and Linear Imputation with PostgreSQL"