用于对月度周期范围进行分组的SQL查询

时间:2015-01-08 19:29:58

标签: sql postgresql

我在构建一个查询时会遇到一些麻烦,该查询会根据一个月内存在的项目将其分组到月份范围内。我正在使用PostgreSQL。

例如,我有一个包含数据的表:

Name    Period(text)
Ana     2010/09
Ana     2010/10
Ana     2010/11
Ana     2010/12
Ana     2011/01
Ana     2011/02
Peter   2009/05
Peter   2009/06
Peter   2009/07
Peter   2009/08
Peter   2009/12
Peter   2010/01
Peter   2010/02
Peter   2010/03
John    2009/05
John    2009/06
John    2009/09
John    2009/11
John    2009/12

我希望结果查询为:

Name    Start     End
Ana     2010/09   2011/02
Peter   2009/05   2009/08
Peter   2009/12   2010/03
John    2009/05   2009/06
John    2009/09   2009/09
John    2009/11   2009/12

有没有办法实现这个目标?

3 个答案:

答案 0 :(得分:7)

这是一个聚合问题,但有一个问题 - 您需要为每个名称定义相邻月份的组。

假设对于给定名称,月份永远不会出现多次,您可以通过为每个句点分配“月份”编号并减去序号来完成此操作。这些值将是连续几个月的常量。

select name, min(period), max(period)
from (select t.*,
             (cast(left(period, 4) as int) * 12 + cast(right(period, 2) as int) -
              row_number() over (partition by name order by period)
             ) as grp
      from names t
     ) t
group by grp, name;

Here是一个说明这一点的SQL小提琴。

注意:重复也不是真正的问题。您可以使用dense_rank()代替row_number()

答案 1 :(得分:6)

我不知道是否有更简单的方法(可能有)但我现在想不到一个:

with parts as (
  select name, 
         to_date(replace(period,'/',''), 'yyyymm') as period
  from names
), flagged as (
  select name, 
         period, 
         case 
           when lag(period,1, (period - interval '1' month)::date) over (partition by name order by period) = (period - interval '1' month)::date then null
           else 1
         end as group_flag
  from parts
), grouped as (
  select flagged.*, 
         coalesce(sum(group_flag) over (partition by name order by period),0) as group_nr
  from flagged
)
select name, min(period), max(period)
from grouped
group by name, group_nr
order by name, min(period);

第一个common table expressionparts)简单将句点更改为日期,以便可以在算术表达式中使用。

第二个CTE(flagged)每当当前行与前一行之间的间隙(以月为单位)不是1时,就会分配一个标记。

然后,第三个CTE累积这些标志,为每个连续的行数定义唯一的组号。

最终选择然后只获得每个组的开始和结束时间。我没有费心将周期转换回原始格式。

SQLFiddle示例还显示了flagged CTE的中间结果:
http://sqlfiddle.com/#!15/8c0aa/2

答案 2 :(得分:2)

执行此操作的常见方法之一可能是递归SQL:

with recursive cte1 as (
    select
        "Name" as name,
        ("Period"||'/01')::date as period
    from Table1
), cte2 as (
    select
        c.name, c.period as s, c.period as e
    from cte1 as c
    where not exists (select * from cte1 as t where t.name = c.name and t.period = c.period - interval '1 month')

    union all

    select
        c.name, c.s as s, t.period
    from cte2 as c
        inner join cte1 as t on t.name = c.name and t.period = c.e + interval '1 month'

)   
select
    c.name, to_char(c.s, 'YYYY/MM') as "Start", to_char(max(c.e), 'YYYY/MM') as "End"
from cte2 as c
group by c.name, c.s
order by 1, 2

我不确定这个的表现,你必须测试它。

<强> sql fiddle demo