我在构建一个查询时会遇到一些麻烦,该查询会根据一个月内存在的项目将其分组到月份范围内。我正在使用PostgreSQL。
例如,我有一个包含数据的表:
Name Period(text)
Ana 2010/09
Ana 2010/10
Ana 2010/11
Ana 2010/12
Ana 2011/01
Ana 2011/02
Peter 2009/05
Peter 2009/06
Peter 2009/07
Peter 2009/08
Peter 2009/12
Peter 2010/01
Peter 2010/02
Peter 2010/03
John 2009/05
John 2009/06
John 2009/09
John 2009/11
John 2009/12
我希望结果查询为:
Name Start End
Ana 2010/09 2011/02
Peter 2009/05 2009/08
Peter 2009/12 2010/03
John 2009/05 2009/06
John 2009/09 2009/09
John 2009/11 2009/12
有没有办法实现这个目标?
答案 0 :(得分:7)
这是一个聚合问题,但有一个问题 - 您需要为每个名称定义相邻月份的组。
假设对于给定名称,月份永远不会出现多次,您可以通过为每个句点分配“月份”编号并减去序号来完成此操作。这些值将是连续几个月的常量。
select name, min(period), max(period)
from (select t.*,
(cast(left(period, 4) as int) * 12 + cast(right(period, 2) as int) -
row_number() over (partition by name order by period)
) as grp
from names t
) t
group by grp, name;
Here是一个说明这一点的SQL小提琴。
注意:重复也不是真正的问题。您可以使用dense_rank()
代替row_number()
。
答案 1 :(得分:6)
我不知道是否有更简单的方法(可能有)但我现在想不到一个:
with parts as (
select name,
to_date(replace(period,'/',''), 'yyyymm') as period
from names
), flagged as (
select name,
period,
case
when lag(period,1, (period - interval '1' month)::date) over (partition by name order by period) = (period - interval '1' month)::date then null
else 1
end as group_flag
from parts
), grouped as (
select flagged.*,
coalesce(sum(group_flag) over (partition by name order by period),0) as group_nr
from flagged
)
select name, min(period), max(period)
from grouped
group by name, group_nr
order by name, min(period);
第一个common table expression(parts
)简单将句点更改为日期,以便可以在算术表达式中使用。
第二个CTE(flagged
)每当当前行与前一行之间的间隙(以月为单位)不是1时,就会分配一个标记。
然后,第三个CTE累积这些标志,为每个连续的行数定义唯一的组号。
最终选择然后只获得每个组的开始和结束时间。我没有费心将周期转换回原始格式。
SQLFiddle示例还显示了flagged
CTE的中间结果:
http://sqlfiddle.com/#!15/8c0aa/2
答案 2 :(得分:2)
执行此操作的常见方法之一可能是递归SQL:
with recursive cte1 as (
select
"Name" as name,
("Period"||'/01')::date as period
from Table1
), cte2 as (
select
c.name, c.period as s, c.period as e
from cte1 as c
where not exists (select * from cte1 as t where t.name = c.name and t.period = c.period - interval '1 month')
union all
select
c.name, c.s as s, t.period
from cte2 as c
inner join cte1 as t on t.name = c.name and t.period = c.e + interval '1 month'
)
select
c.name, to_char(c.s, 'YYYY/MM') as "Start", to_char(max(c.e), 'YYYY/MM') as "End"
from cte2 as c
group by c.name, c.s
order by 1, 2
我不确定这个的表现,你必须测试它。
<强> sql fiddle demo 强>