有没有一种简单的方法可以计算PostgreSQL中的12个月移动平均值?

时间:2019-10-24 12:06:19

标签: postgresql datetime moving-average dynamicquery

这个非常简单的SQL可以计算明确定义的时间段(例如年,月,季度,周,日)的平均值,中位数等。

SELECT
  date_trunc('year', t.time2), -- or hour, day, week, month, year
  count(1), 
  percentile_cont(0.25) within group (order by t.price) as Q1,
  percentile_cont(0.5) within group (order by t.price) as Q2,
  percentile_cont(0.75) within group (order by t.price) as Q3,
  avg(t.price) as A,
  min(t.price) as Mi,
  max(t.price) as Mx

FROM my_table AS t
GROUP BY 1
ORDER BY date_trunc

该表包含具有日期(时间戳)和价格(bigint)的单个交易的列表。

但是,我正在努力调整它以计算运行/移动值(例如4周或6个月,2个季度或12个月)。如何做到这一点?

编辑 数据看起来像这样:

enter image description here

这是预期的结果:

enter image description here

编辑2:

我遇到的另一个问题是在移动平均,中位数等计算中应该包含一整套数据。

我的意思是,如果数据系列从2000年1月开始,那么有意义的第一个“ 12个月移动平均值”只能在2000年12月计算(即,包含完整12个月数据的第一个月) 。如果平均移动3个月,则第一个有意义的值将是2000年3月,以此类推。

因此,我在想,此查询的逻辑应为:

1)确定开始日期和结束日期,以用于计算平均值,中位数等统计数据,然后

2)循环遍历每个开始-结束日期对的平均,中值等计算。

为说明起见,第一部分可能是:

WITH range_values AS ( -- get min and max values for the data series
  SELECT date_trunc('month', min(time2)) as minval,
         date_trunc('month', max(time2)) as maxval
  FROM my_table),
period_range(d) AS ( -- generate complete list of periods eg. weeks, months, years for the data series 
  SELECT generate_series(minval, maxval, '1 month'::interval) as timeint
  FROM range_values
),
lookup_range AS ( -- generate start-end date pairs based on the data series
        select d as enddate, d- interval '11month' as startdate
from period_range
)
SELECT startdate, enddate
from lookup_range, range_values as p
where enddate  >= p.minval + interval '11month'; -- clip date range to calculate 12 months avg using 12 months of data only

第二部分可能是(不是有效的查询,只是为了说明逻辑):

SELECT
  count(1),
  percentile_cont(0.5) within group (order by t.price) as median_price,
  avg(t.price) as avg_price
FROM my_table as t, lookup_range as l
WHERE t.time2>= 'startdate' AND t.time2 < 'enddate'  

现在,挑战在于如何将两者结合起来?以及如何使其以最少的代码行工作?

2 个答案:

答案 0 :(得分:2)

我先按月份汇总,然后计算移动平均值:

SELECT mon,
       sum(s_price) OVER w / sum(c_price) OVER w
FROM (SELECT date_trunc('month', time2::timestamp) AS mon,
             sum(price) AS s_price,
             count(price) AS c_prize
      FROM my_table
      GROUP BY date_trunc('month', time2::timestamp)) AS q
WINDOW w AS (ORDER BY mon
             RANGE BETWEEN '6 months'::interval PRECEDING
                       AND '6 months'::interval FOLLOWING);

答案 1 :(得分:0)

因此,我再一次独自解决难题。我想知道,我的问题真的那么难吗?

无论如何,如果有人正在寻找一种解决方案来计算1,2,3,4,.. 6,... 12年/季度/月/周/天/小时的移动平均值,中位数,百分位数等。一次即可获得摘要统计信息,这是答案:

WITH grid AS (
      SELECT end_time, start_time
      FROM (

            SELECT end_time
          , lag(end_time, 12, 'infinity') OVER (ORDER BY end_time) AS start_time
            FROM (

                SELECT
                generate_series(date_trunc('month', min(time2))
              , date_trunc('month', max(time2)) + interval '1 month', interval '1 month') AS end_time
                FROM   my_table

                ) sub

           ) sub2

      WHERE end_time > start_time

)

SELECT
    to_char(date_trunc('month',a.end_time - interval '1 month'), 'YYYY-MM') as d
  , count(e.time2)
  , percentile_cont(0.25) within group (order by e.price) as Q1
  , percentile_cont(0.5) within group (order by e.price) as median
  , percentile_cont(0.75) within group (order by e.price) as Q3
  , avg(e.price) as Aver
  , min(e.price) as Mi
  , max(e.price) as Mx

FROM grid a

LEFT JOIN my_table e ON e.time2 >= a.start_time

                   AND e.time2 <  a.end_time

GROUP  BY end_time
ORDER  BY d DESC

请注意,该表包含单个时间记录(如销售交易等)的列表,如实际问题中的示例所示。

还有一点:

to_char(date_trunc('month',a.end_time - interval '1 month'), 'YYYY-MM') as d

仅用于显示目的。也就是说,PostgreSQL中的约定是“月末”实际上是下个月的“ 0小时”(即,2019年10月末是“ 2019.11.01,00:00:00”)。同样适用于任何时间范围(例如,2019年底实际上是“ 202:0.01在00:00:00”)。因此,如果不包括“-间隔“ 1个月””,那么截至2019年10月的12个月移动统计信息将显示为“用于” 2019年11月1日00:00:00(将其改为2019-11)。