我想通过时间戳执行移动平均线。 我有两列:温度和时间戳(时间 - 日期),我想基于每15分钟的连续温度观察来执行移动平均。换句话说,选择数据以基于15分钟时间间隔执行平均。而且,对于不同的时间序列,可以具有不同数量的观察。我的意思是所有窗口大小相等(15分钟),但每个窗口可能有不同的观察数量。 例如: 对于第一个窗口,我们必须计算n个观测值的平均值,对于第二个窗口,计算n + 5观测值的平均值。
数据样本:
ID Timestamps Temperature 1 2007-09-14 22:56:12 5.39 2 2007-09-14 22:58:12 5.34 3 2007-09-14 23:00:12 5.16 4 2007-09-14 23:02:12 5.54 5 2007-09-14 23:04:12 5.30 6 2007-09-14 23:06:12 5.20 7 2007-09-14 23:10:12 5.39 8 2007-09-14 23:12:12 5.34 9 2007-09-14 23:20:12 5.16 10 2007-09-14 23:24:12 5.54 11 2007-09-14 23:30:12 5.30 12 2007-09-14 23:33:12 5.20 13 2007-09-14 23:40:12 5.39 14 2007-09-14 23:42:12 5.34 15 2007-09-14 23:44:12 5.16 16 2007-09-14 23:50:12 5.54 17 2007-09-14 23:52:12 5.30 18 2007-09-14 23:57:12 5.20
主要挑战:
如果由于采样频率不同而没有确切的15分钟时间间隔,我如何能够学习每15分钟识别一次的代码。
答案 0 :(得分:9)
你可以自己加入你的桌子:
select l1.id, avg( l2.Temperature )
from l l1
inner join l l2
on l2.id <= l1.id and
l2.Timestamps + interval '15 minutes' > l1.Timestamps
group by l1.id
order by id
;
| ID | AVG |
-----------------------
| 1 | 5.39 |
| 2 | 5.365 |
| 3 | 5.296666666667 |
| 4 | 5.3575 |
| 5 | 5.346 |
| 6 | 5.321666666667 |
| 7 | 5.331428571429 |
注意:只做“努力工作”。您应该将结果与原始表连接或附加新列以进行查询。我不知道你需要的最终查询。调整此解决方案或寻求更多帮助。
答案 1 :(得分:6)
假设您想在每15分钟间隔后重新开始滚动平均值:
select id,
temp,
avg(temp) over (partition by group_nr order by time_read) as rolling_avg
from (
select id,
temp,
time_read,
interval_group,
id - row_number() over (partition by interval_group order by time_read) as group_nr
from (
select id,
time_read,
'epoch'::timestamp + '900 seconds'::interval * (extract(epoch from time_read)::int4 / 900) as interval_group,
temp
from readings
) t1
) t2
order by time_read;
它基于Depesz's solution按“时间范围”分组:
这是一个SQLFiddle示例:http://sqlfiddle.com/#!1/0f3f0/2
答案 2 :(得分:3)
这是一种利用工具将聚合函数用作窗口函数的方法。聚合函数将最后15分钟的观察值与当前的运行总量保持在一个数组中。状态转换功能将元素从落后于15分钟窗口的阵列移开,并推动最新观察。最终函数只是计算数组中的平均温度。
现在,关于这是否有益......这取决于。它侧重于postgresql的plgpsql-execution部分,而不是数据库访问部分,我自己的经验是plpgsql不快。如果您可以轻松地回到表中查找每个观察的前15分钟行,那么自联接(如@danihp答案)将会很好。但是,这种方法可以处理来自某些更复杂来源的观察,其中这些查找不实用。与以往一样,在您自己的系统上进行试验和比较。
-- based on using this table definition
create table observation(id int primary key, timestamps timestamp not null unique,
temperature numeric(5,2) not null);
-- note that I'm reusing the table structure as a type for the state here
create type rollavg_state as (memory observation[], total numeric(5,2));
create function rollavg_func(state rollavg_state, next_in observation) returns rollavg_state immutable language plpgsql as $$
declare
cutoff timestamp;
i int;
updated_memory observation[];
begin
raise debug 'rollavg_func: state=%, next_in=%', state, next_in;
cutoff := next_in.timestamps - '15 minutes'::interval;
i := array_lower(state.memory, 1);
raise debug 'cutoff is %', cutoff;
while i <= array_upper(state.memory, 1) and state.memory[i].timestamps < cutoff loop
raise debug 'shifting %', state.memory[i].timestamps;
i := i + 1;
state.total := state.total - state.memory[i].temperature;
end loop;
state.memory := array_append(state.memory[i:array_upper(state.memory, 1)], next_in);
state.total := coalesce(state.total, 0) + next_in.temperature;
return state;
end
$$;
create function rollavg_output(state rollavg_state) returns float8 immutable language plpgsql as $$
begin
raise debug 'rollavg_output: state=% len=%', state, array_length(state.memory, 1);
if array_length(state.memory, 1) > 0 then
return state.total / array_length(state.memory, 1);
else
return null;
end if;
end
$$;
create aggregate rollavg(observation) (sfunc = rollavg_func, finalfunc = rollavg_output, stype = rollavg_state);
-- referring to just a table name means a tuple value of the row as a whole, whose type is the table type
-- the aggregate relies on inputs arriving in ascending timestamp order
select rollavg(observation) over (order by timestamps) from observation;