有没有办法在Amazon Redshift中计算具有固定窗口大小的加权移动平均线?更详细地说,给定一个带有日期列和值列的表,对于每个日期,计算指定大小的窗口上的加权平均值,并在辅助表中指定权重。
到目前为止,我的搜索尝试为简单平均(没有权重)的窗口函数做了大量示例,例如here。对于postgres也有一些相关的建议,例如this SO question,但是与postgres相比,Redshift的功能集非常稀疏,并且它不支持许多建议的高级功能。
答案 0 :(得分:0)
假设我们有以下表格:
create temporary table _data (ref_date date, value int);
insert into _data values
('2016-01-01', 34)
, ('2016-01-02', 12)
, ('2016-01-03', 25)
, ('2016-01-04', 17)
, ('2016-01-05', 22)
;
create temporary table _weight (days_in_past int, weight int);
insert into _weight values
(0, 4)
, (1, 2)
, (2, 1)
;
然后,如果我们想要在三天(包括当前日期)的窗口上计算移动平均线,其中分配的值接近当前日期的权重比过去的那些更高,我们期望2016-01-05
的加权平均值(基于2016-01-05
,2016-01-04
和2016-01-03
的值):
(22*4 + 17*2 + 25*1) / (4+2+1) = 147 / 7 = 21
查询可能如下所示:
with _prepare_window as (
select
t1.ref_date
, datediff(day, t2.ref_date, t1.ref_date) as days_in_past
, t2.value * weight as weighted_value
, weight
, count(t2.ref_date) over(partition by t1.ref_date rows between unbounded preceding and unbounded following) as num_values_in_window
from
_data t1
left join
_data t2 on datediff(day, t2.ref_date, t1.ref_date) between 0 and 2
left join
_weight on datediff(day, t2.ref_date, t1.ref_date) = days_in_past
order by
t1.ref_date
, datediff(day, t2.ref_date, t1.ref_date)
)
select
ref_date
, round(sum(weighted_value)::float/sum(weight), 0) as weighted_average
from
_prepare_window
where
num_values_in_window = 3
group by
ref_date
order by
ref_date
;
给出结果:
ref_date | weighted_average
------------+------------------
2016-01-03 | 23
2016-01-04 | 19
2016-01-05 | 21
(3 rows)