使用滞后窗口函数查找正确的分区

时间:2014-03-01 20:33:05

标签: sql postgresql window-functions

我有来自不同行业的不同公司的日常时间序列,并与PostgreSQL合作。我从一个例子开始解释我的问题。我有这个:

+------------+---------+-------------+----+
|    day     | company | industry    | v  |
+------------+---------+-------------+----+
| 2012-01-12 | A       | consumer    | 2  |
| 2012-01-12 | B       | consumer    | 2  |
| 2012-01-12 | C       | health      | 4  |
| 2012-01-12 | D       | health      | 4  |
| 2012-01-13 | A       | consumer    | 5  |
| 2012-01-13 | B       | consumer    | 5  |
| 2012-01-13 | C       | health      | 7  |
| 2012-01-13 | D       | health      | 7  |
| 2012-01-16 | A       | consumer    | 8  |
| 2012-01-16 | B       | consumer    | 8  |
| 2012-01-16 | C       | health      | 3  |
| 2012-01-16 | D       | health      | 3  |
+------------+---------+-------------+----+

来自不同行业的不同公司有一些价值v作为各行业的日平均值。 我需要的是:

+------------+---------+----------+---+------------+
|    day     | company | industry | v | delta_v    |
+------------+---------+----------+---+------------+
| 2012-01-12 | A       | consumer | 2 | NULL       |
| 2012-01-12 | B       | consumer | 2 | NULL       |
| 2012-01-12 | C       | health   | 4 | NULL       |
| 2012-01-12 | D       | health   | 4 | NULL       |
| 2012-01-13 | A       | consumer | 5 | 1.5        |
| 2012-01-13 | B       | consumer | 5 | 1.5        |
| 2012-01-13 | C       | health   | 7 | 0.75       |
| 2012-01-13 | D       | health   | 7 | 0.75       |
| 2012-01-16 | A       | consumer | 8 | 0.6        |
| 2012-01-16 | B       | consumer | 8 | 0.6        |
| 2012-01-16 | C       | health   | 3 | -0.571428  |
| 2012-01-16 | D       | health   | 3 | -0.571428  |
+------------+---------+----------+---+------------+

我需要变量v的每日变化。例如,2012-01-12行业“消费者”的v的平均值为2,而2012-01-13的平均值为5.因此增长为(5- 2)/ 2 = 1.5。

我试过了:

    SELECT * 
           , (v - LAG(v) OVER (PARTITION BY industry ORDER BY day) )
           / LAG (v) OVER (PARTITION BY industry ORDER BY day) AS delta_v
    FROM mytable
    ORDER BY day, industry

问题是,如果同一行业中有一家以上的公司在一天内计算了价值变化v也是“日内”。

我希望它只需要在“PARTITION BY”条款中进行一次小修正,但我真的无法弄清楚如何去做。你有什么想法可以帮助我吗?

1 个答案:

答案 0 :(得分:2)

我想你也希望公司在那里:

SELECT t.*,
       ((v - LAG(v) OVER (PARTITION BY industry, company ORDER BY day) )
        / LAG (v) OVER (PARTITION BY industry, company ORDER BY day)
       ) AS delta_v
FROM mytable t
ORDER BY day, industry;

我不确定Postgres是否实际计算lag()两次,但这更容易维护:

SELECT t.*,
       (v / LAG(v) OVER (PARTITION BY industry, company ORDER BY day) ) - 1
       ) AS delta_v
FROM mytable t
ORDER BY day, industry;