Question

使用Postgresql 9.4，我试图对时间序列日志数据进行查询，每当值更新（而不是按计划）时记录新值。日志可以从每分钟几次更新到每天一次。

我需要查询来完成以下操作：

只需选择时间戳范围的第一个条目即可过滤过多的数据
使用最后读数作为日志值填写稀疏数据。例如，如果我按小时对数据进行分组，并且在早上8点有一个条目，日志值为10.然后下一个条目不是到上午11点，日志值为15，我希望查询返回像这样：

Timestamp        | Value
2015-07-01 08:00 | 10 
2015-07-01 09:00 | 10 
2015-07-01 10:00 | 10 
2015-07-01 11:00 | 15

我有一个查询来完成第一个目标：

with time_range as (
    select hour
    from generate_series('2015-07-01 00:00'::timestamp, '2015-07-02 00:00'::timestamp, '1 hour') as hour
),
ranked_logs as (
    select 
        date_trunc('hour', time_stamp) as log_hour,
        log_val,
        rank() over (partition by date_trunc('hour', time_stamp) order by time_stamp asc)
    from time_series
)
select 
    time_range.hour,
    ranked_logs.log_val
from time_range
left outer join ranked_logs on ranked_logs.log_hour = time_range.hour and ranked_logs.rank = 1;

但我无法弄清楚如何填写没有价值的 nulls 。我尝试使用Postgresql的Window函数的lag（）功能，但是当一行中有多个空值时它没有用。

这是一个演示此问题的SQLFiddle： http://sqlfiddle.com/#!15/f4d13/5/0

Answer 1

您的专栏是log_hour和first_vlue

with time_range as (
    select hour
    from generate_series('2015-07-01 00:00'::timestamp, '2015-07-02 00:00'::timestamp, '1 hour') as hour
),
ranked_logs as (
    select 
        date_trunc('hour', time_stamp) as log_hour,
        log_val,
        rank() over (partition by date_trunc('hour', time_stamp) order by time_stamp asc)
    from time_series
),
base as (
select 
    time_range.hour lh,
    ranked_logs.log_val
from time_range
left outer join ranked_logs on ranked_logs.log_hour = time_range.hour and ranked_logs.rank = 1)
SELECT
  log_hour, log_val, value_partition, first_value(log_val) over (partition by value_partition order by log_hour)
FROM (
SELECT
    date_trunc('hour', base.lh) as log_hour,
    log_val,
    sum(case when log_val is null then 0 else 1 end) over (order by base.lh) as value_partition
  FROM base) as q

<强>更新

这是您的查询返回的内容

Timestamp        | Value
2015-07-01 01:00 | 10 
2015-07-01 02:00 | null 
2015-07-01 03:00 | null 
2015-07-01 04:00 | 15 
2015-07-01 05:00 | nul 
2015-07-01 06:00 | 19 
2015-07-01 08:00 | 13

我希望将此结果集拆分为这样的

组

2015-07-01 01:00 | 10       
2015-07-01 02:00 | null     
2015-07-01 03:00 | null    

2015-07-01 04:00 | 15     
2015-07-01 05:00 | nul    

2015-07-01 06:00 | 19     

2015-07-01 08:00 | 13

并为组中的每一行分配该组中第一行的值（由最后一次选择完成）

在这种情况下，获得分组的方法是创建一个保存数量的列在当前行之前计算非空值并按此值分割。（使用sum(case)）

value  | sum(case)
| 10   | 1 |   
| null | 1 |    
| null | 1 |   
| 15   | 2 |  <-- new not null, increment 
| nul  | 2 |  
| 19   | 3 |  <-- new not null, increment 
| 13   | 4 |  <-- new not null, increment

现在我可以参加sum(case)

填写＆amp;过滤不规则时间序列数据

1 个答案: