我想估计一下我的季节性预测与实际数据的差异。我有以下数据集:
day real_revenue historical_coeff
01/01/2017 100 1.1
01/02/2017 105 0.98
01/03/2017 109 1.05
01/04/2017 107 1.07
01/05/2017 90 1
01/06/2017 120 0.95
01/07/2017 98 0.99
在01/01/2017
revenue = 100
天,季节性预测会采用一天中的系数并将其应用于当前收入。因此,它预测01/02/2017
收入将为100*1.1 = 110
,2017年1月1日为110*0.98 = 107.8
,依此类推。然后,预测的剩余收入将是所有预测日的总和。例如,对于01/01/2017
应用日间系数后,总和将为688.274235
。
第二天01/02/2017
我们从值105
开始。因此,我们预测在01/03/2017
上我们会105*0.98 = 102.9
,然后,对于01/04/2017
,我们会预测102.9*1.05 = 108.045
,依此类推。预测的剩余总收入为531.2557215
。
最后我想收到一张这样的表:
day forecasted_total_remaining_revenue
01/01/2017 688.274235
01/02/2017 531.2557
01/03/2017 ...
01/04/2017 ...
01/05/2017 ...
01/06/2017 ...
01/07/2017 ...
基本上,我需要每天累计产品的总和,即a + a*b + a*b*c + a*b*c*d + ...
。
是否可以在vertica或sql中编写这样的查询?
答案 0 :(得分:1)
您可以使用ln()
和exp()
来获取剩余值的乘积:
select t.*,
exp(sum(ln(historical_coeff)) over (order by day desc)) as factor
from t;
当然,如果historical_coeff
为负或零,则表达式会更复杂。
然后,您可以获取此累积总和以获得总和所需的总体因子:
select t.*
real_revenue * sum(factor) over (order by day desc) * forecasted_total_remaining_revenue
from (select t.*,
real_revenue * exp(sum(ln(historical_coeff)) over (order by day desc)) as forecasted_total_remaining_revenue
from t
) t
答案 1 :(得分:0)
在常规SQL(此处显示的语法是SQL Sever)中,可以使用递归cte(如果DBMS支持它们)来完成此操作。
with rownums as (select t.*,row_number() over(order by dt) as rn from tbl t)
,cte as (select rn,dt,real_revenue,historical_coeff,cast(real_revenue*historical_coeff as decimal(38,10)) as res
from rownums
where rn=1
union all
select t.rn,t.dt,t.real_revenue,t.historical_coeff,cast(c.res*t.historical_coeff as decimal(38,10))
from rownums t
join cte c on t.rn=c.rn+1
)
select dt,sum(res) over(order by dt desc) as forecasted_remaining_revenue
from cte
排除最后一个系数的逻辑不明确。这总结了从给定日期到最后日期的所有累积产品。
答案 2 :(得分:0)
我认为您正在寻找类似的内容(您可能需要调整间隔中的天数):
SELECT
day,
SUM ( frev ) OVER ( ORDER BY day
RANGE BETWEEN CURRENT ROW AND INTERVAL '5 DAYS' FOLLOWING
) AS forecasted_total_remaining_revenue
FROM (
SELECT
day,
real_revenue *
EXP( SUM ( LN(historical_coeff)) OVER(
ORDER BY day
RANGE BETWEEN CURRENT ROW AND INTERVAL '5 DAYS' FOLLOWING
)
) AS frev
FROM
public.t1
) a
;