我花了很多时间处理以下内容:
想象一下,您有 N 组,每组记录有多条记录,每条记录都有唯一 starting
和ending
点。
换句话说:
ID|GroupName|StartingPoint|EndingPoint|seq(row_number)|desired_seq
__|_________|_____________|___________|_______________|____________
1 | Grp1 |2014-01-06 |2014-01-07 |1 |1
__|_________|_____________|___________|_______________|____________
2 | Grp1 |2014-01-07 | 2014-01-08|2 |2
__|_________|_____________|___________|_______________|____________
3 | Grp2 |2014-01-08 | 2014-01-09|1 |1
__|_________|_____________|___________|_______________|____________
4 | Grp1 |2014-01-09 | 2014-01-10|3 |1
__|_________|_____________|___________|_______________|____________
5 | Grp2 |2014-01-10 | 2014-01-11|2 |1
__|_________|_____________|___________|_______________|____________
如您所见,每个连续记录的starting point
与前一个记录的ending point
相同。
基本上,我想根据日期获得每组的minimumS and maximumS
。一旦出现具有新组名的记录,则将其视为新组并重置排序。
单row_number()
函数不足以完成此任务,因为它不反映组名的变化。(我在样本数据中包含了一个seq列,表示行号生成的值)
基于样本数据的期望结果:
1 Grp1 |2014-01-06 | 2014-01-08
2 Grp2 |2014-01-08 | 2014-01-09
3 Grp1 |2014-01-09 | 2014-01-10
4 Grp2 |2014-01-10 | 2014-01-11
我尝试过:
;with cte as(
select *
, row_number() over (partition by GroupName order by startingpoint) as seq
from table1
)
select *
into #temp2
from cte t1
left join cte t2 on t1.id=t2.id and t1.seq= t2.seq-1
select *
,(select startingPoint from #temp2 t2 where t1.id=t2.id and t2.seq= (select MIN(seq) from #temp2) as Oldest
(select startingPoint from #temp2 t2 where t1.id=t2.id and t2.seq= (select MAX(seq) from #temp2) as MostRecent
from #temp2 t1
答案 0 :(得分:3)
这是子组的gaps-and-islands
问题。诀窍是按两个ROW_NUMBER()值之间的差异进行分组,一个是分区的,一个是未分区的。
WITH t AS (
SELECT
GroupName,
StartingPoint,
EndingPoint,
ROW_NUMBER() OVER(PARTITION BY GroupName ORDER BY StartingPoint)
- ROW_NUMBER() OVER(ORDER BY StartingPoint) AS SubGroupId
FROM #test
)
SELECT
ROW_NUMBER() OVER (ORDER BY MIN(StartingPoint)) AS SortOrderId,
GroupName AS GroupName,
MIN(StartingPoint) AS GroupStartingPoint,
MAX(EndingPoint) AS GroupEndingPoint
FROM t
GROUP BY GroupName, SubGroupId
ORDER BY SortOrderId
答案 1 :(得分:0)
不确定,但也许:
SELECT DISTINCT
GroupName,
MIN(StartingPoint) OVER (PARTITION BY GroupName ORDER BY Id),
MAX(EndingPoint) OVER (PARTITION BY GroupName ORDER BY Id)
FROM table1
由于partition
不会导致行数减少,因此原始重复的条目会被distinct
删除。
答案 2 :(得分:0)
使用SQL Server 2012中的lag()
功能,所以更容易。我解决这些问题的方法是找到组开始的位置,为每个组分配1或0的标志行。然后获取1
s的累积总和以获得新的组ID。
在SQL Server 2008中,您可以使用相关子查询(或联接)执行此操作:
with table1_flag as (
select t1.*,
isnull((select top 1 1
from table1 t2
where t2.groupname = t1.groupname and
t2.endingpoint = t1.startingpoint
), 0) as groupstartflag
from table1 t1
),
table1_flag_cum as (
select tf.*,
(select sum(groupstartflag)
from table1_flag tf2
where tf2.groupname = tf.groupname and
tf2.startingpoint <= tf.startingpoint
) as groupnum
from table1_flag tf
)
select groupnum, groupname,
min(startingpoint) as startingpoint, max(endingpoint) as endingpoint
from table1_flag_cum
group by groupnum, groupname;