亚马逊Redshift的加权移动平均线

时间:2015-01-26 13:40:37

标签: amazon-redshift

有没有办法在Amazon Redshift中计算具有固定窗口大小的加权移动平均线?更详细地说,给定一个带有日期列和值列的表,对于每个日期,计算指定大小的窗口上的加权平均值,并在辅助表中指定权重。

到目前为止,我的搜索尝试为简单平均(没有权重)的窗口函数做了大量示例,例如here。对于postgres也有一些相关的建议,例如this SO question,但是与postgres相比,Redshift的功能集非常稀疏,并且它不支持许多建议的高级功能。

1 个答案:

答案 0 :(得分:0)

假设我们有以下表格:

create temporary table _data (ref_date date, value int);
insert into _data values
    ('2016-01-01', 34)
  , ('2016-01-02', 12)
  , ('2016-01-03', 25)
  , ('2016-01-04', 17)
  , ('2016-01-05', 22)
;

create temporary table _weight (days_in_past int, weight int);
insert into _weight values
    (0, 4)
  , (1, 2)
  , (2, 1)
;

然后,如果我们想要在三天(包括当前日期)的窗口上计算移动平均线,其中分配的值接近当前日期的权重比过去的那些更高,我们期望2016-01-05的加权平均值(基于2016-01-052016-01-042016-01-03的值):

(22*4 + 17*2 + 25*1) / (4+2+1) = 147 / 7 = 21

查询可能如下所示:

with _prepare_window as (
    select
        t1.ref_date
      , datediff(day, t2.ref_date, t1.ref_date) as days_in_past
      , t2.value * weight as weighted_value
      , weight
      , count(t2.ref_date) over(partition by t1.ref_date rows between unbounded preceding and unbounded following) as num_values_in_window
    from
        _data t1
    left join
        _data t2 on datediff(day, t2.ref_date, t1.ref_date) between 0 and 2
    left join
        _weight on datediff(day, t2.ref_date, t1.ref_date) = days_in_past
    order by
        t1.ref_date
      , datediff(day, t2.ref_date, t1.ref_date)
)
select
    ref_date
  , round(sum(weighted_value)::float/sum(weight), 0) as weighted_average
from
    _prepare_window
where
    num_values_in_window = 3
group by
    ref_date
order by
    ref_date
;

给出结果:

  ref_date  | weighted_average
------------+------------------
 2016-01-03 |               23
 2016-01-04 |               19
 2016-01-05 |               21
(3 rows)