SQL解决方法来替换PostgreSQL 8.4中的FOLLOWING / PRECEEDING

时间:2015-05-07 17:17:38

标签: sql postgresql window-functions postgresql-8.4 moving-average

我有一个使用PostgreSQL 9.0的FOLLOWING / PRECEDING语法执行基本移动平均的查询。令我恐惧的是,我发现我们的pg服务器在8.4上运行,并且在不久的将来无法进行升级。

因此,我正在寻找最简单的方法来进行以下的向后兼容查询:

SELECT time_series,
       avg_price AS daily_price,
       CASE WHEN row_number() OVER (ORDER BY time_series) > 7 
        THEN avg(avg_price) OVER (ORDER BY time_series DESC ROWS BETWEEN 0 FOLLOWING
                                                                     AND 6 FOLLOWING)
        ELSE NULL 
       END AS avg_price
FROM (
   SELECT to_char(closing_date, 'YYYY/MM/DD') AS time_series,
          SUM(price) / COUNT(itemname) AS avg_price
   FROM auction_prices 
   WHERE itemname = 'iphone6_16gb' AND price < 1000
   GROUP BY time_series
   ) sub

这是包含价格和时间戳列的表格的基本7天移动平均值:

closing_date timestamp
price        numeric
itemname     text

基本要求是由于我对SQL的基本了解。

3 个答案:

答案 0 :(得分:1)

Postgres 8.4 already has CTEs.
我建议使用它,计算CTE中的每日平均值,然后自我加入过去一周的所有日子(现有或不存在)。最后,再次汇总每周平均值:

WITH cte AS (
   SELECT closing_date::date AS closing_day
        , sum(price)   AS day_sum
        , count(price) AS day_ct
   FROM   auction_prices
   WHERE  itemname = 'iphone6_16gb'
   AND    price <= 1000  -- including upper border
   GROUP  BY 1
   )   
SELECT d.closing_day
     , CASE WHEN d.day_ct > 1
            THEN d.day_sum / d.day_ct
            ELSE d.day_sum
       END AS avg_day         -- also avoids division-by-zero
     , CASE WHEN sum(w.day_ct) > 1
            THEN sum(w.day_sum) / sum(w.day_ct)
            ELSE sum(w.day_sum)
       END AS week_avg_proper  -- also avoids division-by-zero
FROM   cte d
JOIN   cte w ON w.closing_day BETWEEN d.closing_day - 6 AND d.closing_day
GROUP  BY d.closing_day, d.day_sum, d.day_ct
ORDER  BY 1;

SQL Fiddle.(在Postgres 9.3上运行,但也应该在8.4中运行。)

注释

  • 我使用不同(正确)算法来计算每周平均值。请参阅我的comment to the question

  • 中的注意事项
  • 这会计算基表中天的平均值,包括极端情况。但是没有任何行没有行。

  • 可以从integerdate中减去d.closing_day - 6。 (但不是来自varchartimestamp!)

  • 您拨打timestampclosing_date时非常困惑 - 它不是date,而是timestamptime_series为结果列的值为date?我使用closing_day代替......

  • 请注意我如何计算价格count(price)而不是项目 COUNT(itemname) - 这将是一个偷偷摸摸的错误的入口点其中任何一列都可以为NULL。如果 都不能为count(*),那么{/ p>

  • CASE构造避免了被零除错误,只要您计算的列 为NULL,就会发生这种错误。我可以将COALESCE用于此目的,但在此期间我也简化了1个价格的情况。

答案 1 :(得分:0)

PostgreSQL 8.4 ....当时每个人都认为Windows 95很棒?总之...

我能想到的唯一选择是使用带有可滚动游标的存储过程并手动进行数学运算:

CREATE FUNCTION auction_prices(item text, price_limit real)
  RETURNS TABLE (closing_date timestamp, avg_day real, avg_7_day real) AS $$
DECLARE
  last_date  date;
  first_date date;
  cur        refcursor;
  rec        record;
  dt         date;
  today      date;
  today_avg  real;
  p          real;
  sum_p      real;
  n          integer;
BEGIN
  -- There may be days when an item was not traded under the price limit, so need a
  -- series of consecutive days to find all days. Find the end-points of that
  -- interval.
  SELECT max(closing_date), min(closing_date) INTO last_date, first_date
  FROM auction_prices
  WHERE itemname = item AND price < price_limit;

  -- Need at least some data, so quit if item was never traded under the price limit.
  IF NOT FOUND THEN
    RETURN;
  END IF;

  -- Create a scrollable cursor over the auction_prices daily average and the
  -- series of consecutive days. The LEFT JOIN means that you will get a NULL
  -- for avg_price on days without trading.
  OPEN cur SCROLL FOR
    SELECT days.dt, sub.avg_price
    FROM generate_series(last_date, first_date, interval '-1 day') AS days(dt)
    LEFT JOIN (
      SELECT sum(price) / count(itemname) AS avg_price
      FROM auction_prices 
      WHERE itemname = item AND price < price_limit
      GROUP BY closing_date
    ) sub ON sub.closing_date::date = days.dt::date;

  <<all_recs>>
  LOOP -- over the entire date series
    -- Get today's data (today = first day of 7-day period)
    FETCH cur INTO today, today_avg;
    EXIT all_recs WHEN NOT FOUND; -- No more data, so exit the loop
    IF today_avg IS NULL THEN
      n := 0;
      sum_p := 0.0;
    ELSE
      n := 1;
      sum_p := today_avg;
    END IF;

    -- Loop over the remaining 6 days
    FOR i IN 2 .. 7 LOOP
      FETCH cur INTO dt, p;
      EXIT all_recs WHEN NOT FOUND; -- No more data, so exit the loop
      IF p IS NOT NULL THEN
        sum_p := sum_p + p;
        n := n + 1;
      END IF;
    END LOOP;

    -- Save the data to the result set
    IF n > 0 THEN
      RETURN NEXT today, today_avg, sum_p / n;
    ELSE
      RETURN NEXT today, today_avg, NULL;
    END IF;

    -- Move the cursor back to the starting row of the next 7-day period
    MOVE RELATIVE -6 FROM cur;
  END LOOP all_recs;
  CLOSE cur;

  RETURN;
END; $$ LANGUAGE plpgsql STRICT;

一些注意事项:

  • 可能存在物品未按限价交易的日期。为了获得准确的移动平均线,您需要包括那些日子。生成一系列连续日期,在此期间项目确实以限价交易,您将获得准确的结果。
  • 光标需要可滚动,以便您可以提前6天到更早的日期来获取计算所需的数据,然后再返回6天来计算第二天的平均值。
  • 您无法计算过去6天的移动平均线。原因很简单,MOVE命令需要移动一定数量的记录。不支持参数替换。从好的方面来看,您的移动平均线将始终为7天(其中并非所有交易平均值都可以)。
  • 这个功能绝不会很快,但它应该有效。不过没有保证,我多年来一直没有使用8.4盒子。

使用此功能非常简单。由于它返回一个表,你可以在FROM子句中使用它,就像任何其他表一样(甚至JOIN到其他关系):

SELECT to_char(closing_date, 'YYYY/MM/DD') AS time_series, avg_day, avg_7_day
FROM auction_prices('iphone6_16gb', 1000);

答案 2 :(得分:0)

        -- make a subset and rank it on date
WITH xxx AS (
        SELECT
        rank() OVER(ORDER BY closing_date) AS rnk
        , closing_date
        , price
        FROM auction_prices
        WHERE itemname = 'iphone6_16gb' AND price < 1000
        )
        -- select subset, + aggregate on self-join
SELECT this.*
        , (SELECT AVG(price) AS mean
                FROM xxx that
                WHERE that.rnk > this.rnk + 0 -- <<-- adjust window
                AND that.rnk < this.rnk + 7   -- <<-- here
                )
FROM xxx this
ORDER BY this.rnk
        ;
  • 注意:CTE是为了方便(Postgres-8.4确实有CTE&#39; s),但是CTE可以用子查询替换,或者更优雅地用视图替换。
  • 代码假定时间序列没有间隙(:每个{product * day}一个选项。当不是:加入日历表(也可能包含排名)。
  • (另请注意,我没有涵盖角落案件。)