GROUP BY由间隙分隔的连续日期

时间:2012-10-22 10:51:43

标签: sql postgresql aggregate window-functions

假设您(在Postgres 9.1中)有这样一个表:

date | value 

它有一些空白(我的意思是:并非每个可能的日期在最小值(日期)和最大值(日期)之间都有行。)

我的问题是如何聚合这些数据,以便分别处理每个一致的组(没有间隙),如下所示:

min_date | max_date | [some aggregate of "value" column] 

任何想法怎么做?我相信它可以使用窗口函数,但过了一会儿尝试lag()lead()我有点卡住了。

例如,如果数据是这样的:

 date          | value  
---------------+-------  
 2011-10-31    | 2  
 2011-11-01    | 8  
 2011-11-02    | 10  
 2012-09-13    | 1  
 2012-09-14    | 4  
 2012-09-15    | 5  
 2012-09-16    | 20  
 2012-10-30    | 10  

输出(作为聚合的sum)将是:

   min     |    max     |  sum  
-----------+------------+-------  
2011-10-31 | 2011-11-02 |  20  
2012-09-13 | 2012-09-16 |  30  
2012-10-30 | 2012-10-30 |  10  

2 个答案:

答案 0 :(得分:10)

create table t ("date" date, "value" int);
insert into t ("date", "value") values
    ('2011-10-31', 2),
    ('2011-11-01', 8),
    ('2011-11-02', 10),
    ('2012-09-13', 1),
    ('2012-09-14', 4),
    ('2012-09-15', 5),
    ('2012-09-16', 20),
    ('2012-10-30', 10);

更简单,更便宜的版本:

select min("date"), max("date"), sum(value)
from (
    select
        "date", value,
        "date" - (dense_rank() over(order by "date"))::int g
    from t
) s
group by s.g
order by 1

我的第一次尝试更复杂,更昂贵:

create temporary sequence s;
select min("date"), max("date"), sum(value)
from (
    select 
        "date", value, d,
        case 
            when lag("date", 1, null) over(order by s.d) is null and "date" is not null 
                then nextval('s')
            when lag("date", 1, null) over(order by s.d) is not null and "date" is not null 
                then lastval()
            else 0 
        end g
    from 
        t
        right join
        generate_series(
            (select min("date") from t)::date, 
            (select max("date") from t)::date + 1, 
            '1 day'
        ) s(d) on s.d::date = t."date"
) q
where g != 0
group by g
order by 1
;
drop sequence s;

输出:

    min     |    max     | sum 
------------+------------+-----
 2011-10-31 | 2011-11-02 |  20
 2012-09-13 | 2012-09-16 |  30
 2012-10-30 | 2012-10-30 |  10
(3 rows)

答案 1 :(得分:0)

这是 解决方法。

首先,要获得连续系列的开头,此查询将为您提供第一个日期:

SELECT first.date
FROM raw_data first
     LEFT OUTER JOIN raw_data prior_first ON first.date = prior_first + 1
WHERE prior_first IS NULL

同样是连续系列的结尾,

SELECT last.date
FROM raw_data last
     LEFT OUTER JOIN raw_data after_last ON last.date = after_last - 1
WHERE after_last IS NULL

您可以考虑制作这些视图,以简化使用它们的查询。

我们只需要第一个形成组范围

CREATE VIEW beginings AS
SELECT first.date
FROM raw_data first
     LEFT OUTER JOIN raw_data prior_first ON first.date = prior_first + 1
WHERE prior_first IS NULL

CREATE VIEW endings AS
SELECT last.date
FROM raw_data last
     LEFT OUTER JOIN raw_data after_last ON last.date = after_last - 1
WHERE after_last IS NULL

SELECT MIN(raw.date), MAX(raw.date), SUM(raw.value)
FROM raw_data raw
  INNER JOIN (SELECT lo.date AS lo_date, MIN(hi.date) as hi_date
              FROM beginnings lo, endings hi
              WHERE lo.date < hi.date
              GROUP BY lo.date) range
     ON raw.date >= range.lo_date AND raw.date <= range.hi_date
GROUP BY range.lo_date