填写&过滤不规则时间序列数据

时间:2015-07-29 20:48:26

标签: sql postgresql

使用Postgresql 9.4,我试图对时间序列日志数据进行查询,每当值更新(而不是按计划)时记录新值。日志可以从每分钟几次更新到每天一次。

我需要查询来完成以下操作:

  1. 只需选择时间戳范围的第一个条目即可过滤过多的数据
  2. 使用最后读数作为日志值填写稀疏数据。例如,如果我按小时对数据进行分组,并且在早上8点有一个条目,日志值为10.然后下一个条目不是到上午11点,日志值为15,我希望查询返回像这样:
  3. Timestamp        | Value
    2015-07-01 08:00 | 10 
    2015-07-01 09:00 | 10 
    2015-07-01 10:00 | 10 
    2015-07-01 11:00 | 15 
    

    我有一个查询来完成第一个目标:

    with time_range as (
        select hour
        from generate_series('2015-07-01 00:00'::timestamp, '2015-07-02 00:00'::timestamp, '1 hour') as hour
    ),
    ranked_logs as (
        select 
            date_trunc('hour', time_stamp) as log_hour,
            log_val,
            rank() over (partition by date_trunc('hour', time_stamp) order by time_stamp asc)
        from time_series
    )
    select 
        time_range.hour,
        ranked_logs.log_val
    from time_range
    left outer join ranked_logs on ranked_logs.log_hour = time_range.hour and ranked_logs.rank = 1;
    

    但我无法弄清楚如何填写没有价值的 nulls 。我尝试使用Postgresql的Window函数的lag()功能,但是当一行中有多个空值时它没有用。

    这是一个演示此问题的SQLFiddle: http://sqlfiddle.com/#!15/f4d13/5/0

1 个答案:

答案 0 :(得分:1)

您的专栏是log_hourfirst_vlue

with time_range as (
    select hour
    from generate_series('2015-07-01 00:00'::timestamp, '2015-07-02 00:00'::timestamp, '1 hour') as hour
),
ranked_logs as (
    select 
        date_trunc('hour', time_stamp) as log_hour,
        log_val,
        rank() over (partition by date_trunc('hour', time_stamp) order by time_stamp asc)
    from time_series
),
base as (
select 
    time_range.hour lh,
    ranked_logs.log_val
from time_range
left outer join ranked_logs on ranked_logs.log_hour = time_range.hour and ranked_logs.rank = 1)
SELECT
  log_hour, log_val, value_partition, first_value(log_val) over (partition by value_partition order by log_hour)
FROM (
SELECT
    date_trunc('hour', base.lh) as log_hour,
    log_val,
    sum(case when log_val is null then 0 else 1 end) over (order by base.lh) as value_partition
  FROM base) as q

<强>更新

这是您的查询返回的内容

Timestamp        | Value
2015-07-01 01:00 | 10 
2015-07-01 02:00 | null 
2015-07-01 03:00 | null 
2015-07-01 04:00 | 15 
2015-07-01 05:00 | nul 
2015-07-01 06:00 | 19 
2015-07-01 08:00 | 13 

我希望将此结果集拆分为这样的

2015-07-01 01:00 | 10       
2015-07-01 02:00 | null     
2015-07-01 03:00 | null    

2015-07-01 04:00 | 15     
2015-07-01 05:00 | nul    

2015-07-01 06:00 | 19     

2015-07-01 08:00 | 13   

并为组中的每一行分配该组中第一行的值(由最后一次选择完成)

在这种情况下,获得分组的方法是创建一个保存数量的列 在当前行之前计算非空值并按此值分割。 (使用sum(case)

value  | sum(case)
| 10   | 1 |   
| null | 1 |    
| null | 1 |   
| 15   | 2 |  <-- new not null, increment 
| nul  | 2 |  
| 19   | 3 |  <-- new not null, increment 
| 13   | 4 |  <-- new not null, increment 

现在我可以参加sum(case)