Postgres窗口(确定连续的天数)

时间:2016-02-18 20:29:54

标签: sql postgresql window-functions gaps-and-islands

使用Postgres 9.3,我试图计算某种天气类型的连续天数。如果我们假设我们有一个固定的时间序列和天气预报:

date|weather
"2016-02-01";"Sunny"
"2016-02-02";"Cloudy"
"2016-02-03";"Snow"
"2016-02-04";"Snow"
"2016-02-05";"Cloudy"
"2016-02-06";"Sunny"
"2016-02-07";"Sunny"
"2016-02-08";"Sunny"
"2016-02-09";"Snow"
"2016-02-10";"Snow"

我希望有一些东西可以算出同一天气的连续日子。结果看起来应该是这样的:

date|weather|contiguous_days 
"2016-02-01";"Sunny";1
"2016-02-02";"Cloudy";1
"2016-02-03";"Snow";1
"2016-02-04";"Snow";2
"2016-02-05";"Cloudy";1
"2016-02-06";"Sunny";1
"2016-02-07";"Sunny";2
"2016-02-08";"Sunny";3
"2016-02-09";"Snow";1
"2016-02-10";"Snow";2

我一直在试图使用窗口功能一段时间。起初,它似乎应该是明智的,但后来我发现它比预期的要困难得多。

这是我尝试过的......

Select date, weather, Row_Number() Over (partition by weather order by date)
  from t_weather

将当前行与下一行进行比较会更容易吗?在保持计数的同时,你会怎么做?任何想法,想法,甚至解决方案都会有所帮助! -Kip

4 个答案:

答案 0 :(得分:2)

您需要确定天气相同的连续位置。您可以通过添加分组标识符来完成此操作。有一种简单的方法:从日期中减去一组递增的数字,并且对于连续的日期它是恒定的。

你有一个分组,其余的是row_number()

Select date, weather,
       Row_Number() Over (partition by weather, grp order by date)
from (select w.*, 
             (date - row_number() over (partition by weather order by date) * interval '1 day') as grp
      from t_weather w
     ) w;

SQL小提琴是here

答案 1 :(得分:2)

我不确定在同一数据集中多次扫描时查询引擎会做什么(有点像计算曲线下的区域),但是这样可行......

WITH v(date, weather) AS (
VALUES 
('2016-02-01'::date,'Sunny'::text),
('2016-02-02','Cloudy'),
('2016-02-03','Snow'),
('2016-02-04','Snow'),
('2016-02-05','Cloudy'),
('2016-02-06','Sunny'),
('2016-02-07','Sunny'),
('2016-02-08','Sunny'),
('2016-02-09','Snow'),
('2016-02-10','Snow') ),
changes AS (
SELECT date, 
    weather, 
    CASE WHEN lag(weather) OVER () = weather THEN 1 ELSE 0 END change
FROM v)
SELECT date
    , weather
    ,(SELECT count(weather) -- number of times the weather didn't change
      FROM changes v2 
      WHERE v2.date <= v1.date AND v2.weather = v1.weather
        AND v2.date >= ( -- bounded between changes of weather
            SELECT max(date) 
            FROM changes v3 
            WHERE change = 0 
            AND v3.weather = v1.weather 
            AND v3.date <= v1.date)  --<-- here's the expensive part
    ) curve
FROM changes v1

答案 2 :(得分:1)

您可以使用递归CTE完成此操作,如下所示:

WITH RECURSIVE CTE_ConsecutiveDays AS
(
    SELECT
        my_date,
        weather,
        1 AS consecutive_days
    FROM My_Table T
    WHERE
        NOT EXISTS (SELECT * FROM My_Table T2 WHERE T2.my_date = T.my_date - INTERVAL '1 day' AND T2.weather = T.weather)
    UNION ALL
    SELECT
        T.my_date,
        T.weather,
        CD.consecutive_days + 1
    FROM
        CTE_ConsecutiveDays CD
    INNER JOIN My_Table T ON
        T.my_date = CD.my_date + INTERVAL '1 day' AND
        T.weather = CD.weather
)
SELECT *
FROM CTE_ConsecutiveDays
ORDER BY my_date;

以下是要测试的SQL小提琴:http://www.sqlfiddle.com/#!15/383e5/3

答案 3 :(得分:1)

这是另一种基于this answer的方法。

首先,我们添加一个change10,具体取决于前一天天气是否不同。
然后,我们通过将group_nrchange相加来引入order by date列。这会为连续相同天气日的每个序列生成唯一的组编号,因为总和仅在每个序列的第一天递增。
最后,我们执行row_number() over (partition by group_nr order by date)来生成每组的运行计数。

select date, weather, row_number() over (partition by group_nr order by date)
from (
  select *, sum(change) over (order by date) as group_nr
  from (
    select *, (weather != lag(weather,1,'') over (order by date))::int as change
    from tmp_weather
  ) t1
) t2;

sqlfiddle(使用等效的WITH语法)