使用Postgres 9.3,我试图计算某种天气类型的连续天数。如果我们假设我们有一个固定的时间序列和天气预报:
date|weather
"2016-02-01";"Sunny"
"2016-02-02";"Cloudy"
"2016-02-03";"Snow"
"2016-02-04";"Snow"
"2016-02-05";"Cloudy"
"2016-02-06";"Sunny"
"2016-02-07";"Sunny"
"2016-02-08";"Sunny"
"2016-02-09";"Snow"
"2016-02-10";"Snow"
我希望有一些东西可以算出同一天气的连续日子。结果看起来应该是这样的:
date|weather|contiguous_days
"2016-02-01";"Sunny";1
"2016-02-02";"Cloudy";1
"2016-02-03";"Snow";1
"2016-02-04";"Snow";2
"2016-02-05";"Cloudy";1
"2016-02-06";"Sunny";1
"2016-02-07";"Sunny";2
"2016-02-08";"Sunny";3
"2016-02-09";"Snow";1
"2016-02-10";"Snow";2
我一直在试图使用窗口功能一段时间。起初,它似乎应该是明智的,但后来我发现它比预期的要困难得多。
这是我尝试过的......
Select date, weather, Row_Number() Over (partition by weather order by date)
from t_weather
将当前行与下一行进行比较会更容易吗?在保持计数的同时,你会怎么做?任何想法,想法,甚至解决方案都会有所帮助! -Kip
答案 0 :(得分:2)
您需要确定天气相同的连续位置。您可以通过添加分组标识符来完成此操作。有一种简单的方法:从日期中减去一组递增的数字,并且对于连续的日期它是恒定的。
你有一个分组,其余的是row_number()
:
Select date, weather,
Row_Number() Over (partition by weather, grp order by date)
from (select w.*,
(date - row_number() over (partition by weather order by date) * interval '1 day') as grp
from t_weather w
) w;
SQL小提琴是here。
答案 1 :(得分:2)
我不确定在同一数据集中多次扫描时查询引擎会做什么(有点像计算曲线下的区域),但是这样可行......
WITH v(date, weather) AS (
VALUES
('2016-02-01'::date,'Sunny'::text),
('2016-02-02','Cloudy'),
('2016-02-03','Snow'),
('2016-02-04','Snow'),
('2016-02-05','Cloudy'),
('2016-02-06','Sunny'),
('2016-02-07','Sunny'),
('2016-02-08','Sunny'),
('2016-02-09','Snow'),
('2016-02-10','Snow') ),
changes AS (
SELECT date,
weather,
CASE WHEN lag(weather) OVER () = weather THEN 1 ELSE 0 END change
FROM v)
SELECT date
, weather
,(SELECT count(weather) -- number of times the weather didn't change
FROM changes v2
WHERE v2.date <= v1.date AND v2.weather = v1.weather
AND v2.date >= ( -- bounded between changes of weather
SELECT max(date)
FROM changes v3
WHERE change = 0
AND v3.weather = v1.weather
AND v3.date <= v1.date) --<-- here's the expensive part
) curve
FROM changes v1
答案 2 :(得分:1)
您可以使用递归CTE完成此操作,如下所示:
WITH RECURSIVE CTE_ConsecutiveDays AS
(
SELECT
my_date,
weather,
1 AS consecutive_days
FROM My_Table T
WHERE
NOT EXISTS (SELECT * FROM My_Table T2 WHERE T2.my_date = T.my_date - INTERVAL '1 day' AND T2.weather = T.weather)
UNION ALL
SELECT
T.my_date,
T.weather,
CD.consecutive_days + 1
FROM
CTE_ConsecutiveDays CD
INNER JOIN My_Table T ON
T.my_date = CD.my_date + INTERVAL '1 day' AND
T.weather = CD.weather
)
SELECT *
FROM CTE_ConsecutiveDays
ORDER BY my_date;
以下是要测试的SQL小提琴:http://www.sqlfiddle.com/#!15/383e5/3
答案 3 :(得分:1)
这是另一种基于this answer的方法。
首先,我们添加一个change
列1
或0
,具体取决于前一天天气是否不同。
然后,我们通过将group_nr
与change
相加来引入order by date
列。这会为连续相同天气日的每个序列生成唯一的组编号,因为总和仅在每个序列的第一天递增。
最后,我们执行row_number() over (partition by group_nr order by date)
来生成每组的运行计数。
select date, weather, row_number() over (partition by group_nr order by date)
from (
select *, sum(change) over (order by date) as group_nr
from (
select *, (weather != lag(weather,1,'') over (order by date))::int as change
from tmp_weather
) t1
) t2;
sqlfiddle(使用等效的WITH
语法)