我想使用SQL有效地将时间序列延伸到不同的长度。假设我有以下数据:
-- drop table if exists time_series;
create table time_series (
id serial,
val numeric)
;
insert into time_series (val) values
(1), (2), (3), (4), (5), (6),
(5), (4), (3), (2), (1);
这个时间序列的长度为11,我希望将它拉伸到长度为15,使得拉伸时间序列中的值之和与原始时间序列中的值之和相同。我有一个效率不高的解决方案:
select
new_id,
sum(new_val) as new_val
from
(
select
id,
val/15.0 as new_val,
ceil(row_number() over(order by id, gs) / 11.0) as new_id
from
time_series
cross join (select generate_series(1, 15) gs) gs
) raw_data
group by
new_id
order by
new_id
;
这将首先创建一个包含15 * 11行的表,然后将其折叠回15行。
虽然这适用于小型时间序列,但随着时间序列的延长,性能会显着下降。鉴于我想将2,000行扩展到3,000行,而查询必须首先生成6M行(在我的笔记本电脑上需要30秒)。
测试数据:
insert into time_series (val) select generate_series(1, 1000);
insert into time_series (val) select generate_series(1000, 1, -1);
SQL中是否有更高效的解决方案,结果相同?
答案 0 :(得分:1)
尝试此查询而不交叉连接。
首先,我们生成具有值间隔的ts1
子查询,然后将其与新序列连接。并在选择列表中插入(线性)新ID到连接的值间隔 - new_val
。
同样在此查询中,我们使用+1-1
将1,2,3,...
序列转换为0,1,2,....
select
gs as new_id,
Sval+(Eval-SVal)*((gs.gs-1) /(100.0/(11.0-1))+1-ts1.ID) as new_val,
SVal as StartInterval,
EVal as EndInterval
from
(Select generate_series(1, 100) gs) gs
left join
(select T1.ID, T1.Val SVal,T2.Val EVal
FROM
time_series T1
JOIN time_series T2 ON T1.Id=T2.ID-1) ts1
ON floor((gs.gs-1) /(100.0/(11.0-1)))+1=ts1.ID
order by
gs
答案 1 :(得分:0)
我明白了。要将包含5个元素的时间序列延伸到包含30个元素的时间序列,同时保持值的总和,您可以使用:
with time_series (id, val) as (values
(1, 1),
(2, 2),
(3, 3),
(4, 2),
(5, 1)
)
, mapping_to_old_ts_ids as (
select
gs as new_id,
case when mod(((gs - 1) * otsl + 1), ntsl) <> 0 then ((gs - 1) * otsl + 1) / ntsl + 1 else ((gs - 1) * otsl + 1) / ntsl end as old_id_start,
case mod(((gs - 1) * otsl + 1), ntsl) when 0 then ntsl else mod(((gs - 1) * otsl + 1), ntsl) end as old_id_start_piece,
case when mod((gs * otsl), ntsl) <> 0 then (gs * otsl) / ntsl + 1 else (gs * otsl) / ntsl end as old_id_end,
case mod((gs * otsl), ntsl) when 0 then ntsl else mod((gs * otsl), ntsl) end as old_id_end_piece,
ntsl
from
(select generate_series(1, ntsl) as gs, ntsl from (select 30 as ntsl) a) new_time_series
cross join (select count(*) as otsl from time_series) old_time_series_length
)
select
new_id,
case
when old_id_start = old_id_end then (old_id_end_piece - old_id_start_piece + 1) / ntsl::numeric * ts1.val
when old_id_start <> old_id_end then (ntsl::numeric - old_id_start_piece +1 ) / ntsl::numeric * ts1.val + coalesce((old_id_end_piece / ntsl::numeric * ts2.val), 0) end
from
mapping_to_old_ts_ids oid
join time_series ts1 on (oid.old_id_start = ts1.id)
left join time_series ts2 on (oid.old_id_end = ts2.id)
order by
new_id
以上查询已经是我原始的,更详细的查询的简化版本。如果你有兴趣,这就是我逐渐想出解决方案的方法(尝试将5行拉伸到8行):
with time_series (id, val) as (values
(1, 1),
(2, 2),
(3, 3),
(4, 2),
(5, 1)
)
/* The basic idea is to divide every element into 8 pieces and then aggregate it
back by 5 elements. When trying to stretch 5 into 8, we will have 5 * 8 = 40
elements. For every element in new time series we can calculate what is the id
of first and last piece. */
, piece_start_end as (
select
gs as new_id,
(gs - 1) * 5 + 1 as piece_start,
gs * 5 as piece_end
from
generate_series(1, 8) gs
)
/* No we need to calculate where exactly in the old time series we have beginning
and end of pieces. E.g. 1st element of new time series starts in element 1 at position 1
and ends in element 1 at position 5. 2nd element of new time series starts in element 1
at position 6 and ends in element 2 at position 2. */
, mapping_to_old_ts_ids as (
select
*,
case when mod(piece_start, 8) <> 0 then piece_start / 8 + 1 else piece_start / 8 end as old_id_start,
case mod(piece_start, 8) when 0 then 8 else mod(piece_start, 8) end as old_id_start_piece,
case when mod(piece_end, 8) <> 0 then piece_end / 8 + 1 else piece_end / 8 end as old_id_end,
case mod(piece_end, 8) when 0 then 8 else mod(piece_end, 8) end as old_id_end_piece
from
piece_start_end
)
/* In final step we just need to assign final value to new time series by taking
appropriate number of pieces from old time series elements. */
select
new_id,
old_id_start,
old_id_start_piece,
ts1.val as old_id_start_val,
old_id_end,
old_id_end_piece,
ts2.val as old_id_end_val,
case
when old_id_start = old_id_end then (old_id_end_piece - old_id_start_piece + 1) / 8.0 * ts1.val
when old_id_start <> old_id_end then (8 - old_id_start_piece +1 ) / 8.0 * ts1.val + coalesce((old_id_end_piece / 8.0 * ts2.val), 0) end
from
mapping_to_old_ts_ids oid
join time_series ts1 on (oid.old_id_start = ts1.id)
left join time_series ts2 on (oid.old_id_end = ts2.id)