我有来自旧系统的人员的记录,我试图将其转换为新系统。在旧系统中,一个人最终可能会在同一位置找到几条记录。他们也可以从位置到另一个,然后返回到以前的位置。以下是一些示例数据:
PersonID | LocationID | StartDate | EndDate
1 | 1 | 1980-07-30 | 2007-07-16
1 | 1 | 2007-07-16 | 2008-01-30
1 | 2 | 2008-01-30 | 2009-03-02
1 | 2 | 2009-03-02 | 2009-11-06
1 | 3 | 2014-07-16 | 2015-01-16
1 | 1 | 2016-01-26 | 2999-12-31
我想折叠这些数据,以便获得任何连续LocationID的日期范围。对于上面的数据,这是我所期望的:
PersonID | LocationID | StartDate | EndDate
1 | 1 | 1980-07-30 | 2008-01-30
1 | 2 | 2008-01-30 | 2009-11-06
1 | 3 | 2014-07-16 | 2015-01-16
1 | 1 | 2016-01-26 | 2999-12-31
我不确定如何做到这一点。我以前曾尝试加入上一条记录,但只有在连续两个位置时才有效,而不是3个或更多(可能有不确定数量的连续记录)。
select
a.PersonID,
a.LocationID,
a.StartDate,
a.EndDate,
case when a.LocationID = b.LocationID then a.PK_ID else b.PK_ID end as NewID
from employees a
left outer join employees b
on a.PersonID = b.PersonID
and a.PK_ID = b.PK_ID - 1
那么,如何编写查询以获得我需要的结果呢?
注意:我们正在治疗2999-12-31'是我们的“空”'日期字段
答案 0 :(得分:1)
对于样本数据,您可以使用行号方法的差异:
select personid, locationid, min(startdate), max(enddate)
from (select e.*,
row_number() over (partition by personid order by startdate) as seqnum_p,
row_number() over (partition by personid, locationid order by startdate) as seqnum_pl
from employees e
) e
group by (seqnum_p - seqnum_pl), personid, locationid;
这假定开始日期和结束日期是连续的。也就是说,同一地点的特定员工没有差距。
答案 1 :(得分:1)
这是一个经典的Gaps-and-Islands (编辑 - 纠正更大的跨度2999)
Select [PersonID]
,[LocationID]
,[StartDate] = min(D)
,[EndDate] = max(D)
From (
Select *
,Grp = Row_Number() over (Order By D) - Row_Number() over (Partition By [PersonID],[LocationID] Order By D)
from YourTable A
Cross Apply (
Select Top (DateDiff(DAY,A.[StartDate],A.[EndDate])+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),A.[StartDate])
From master..spt_values n1,master..spt_values n2
) B
) G
Group By [PersonID],[LocationID],Grp
Order By [PersonID],min(D)
<强>返回强>
PersonID LocationID StartDate EndDate
1 1 1980-07-30 2008-01-30
1 2 2008-01-30 2009-11-06
1 3 2014-07-16 2015-01-16
1 1 2016-01-26 2999-12-31
使用原始查询
Select [PersonID]
,[LocationID]
,[StartDate] = min(D)
,[EndDate] = max(D)
From (
Select *
,Grp = Row_Number() over (Order By D) - Row_Number() over (Partition By [PersonID],[LocationID] Order By D)
From (
-- Your Original Query
select
a.PersonID,
a.LocationID,
a.StartDate,
a.EndDate,
case when a.LocationID = b.LocationID then a.PK_ID else b.PK_ID end as NewID
from employees a
left outer join employees b
on a.PersonID = b.PersonID
and a.PK_ID = b.PK_ID - 1
) A
Cross Apply (
Select Top (DateDiff(DAY,A.[StartDate],A.[EndDate])+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),A.[StartDate])
From master..spt_values n1,master..spt_values n2
) B
) G
Group By [PersonID],[LocationID],Grp
Order By [PersonID],min(D)
请求评论
让我们把它分解成它的组成部分。
1)CROSS APPLY部分:这会将单个记录扩展为N条记录。例如:
Declare @YourTable Table ([PersonID] int,[LocationID] int,[StartDate] date,[EndDate] date)
Insert Into @YourTable Values
(1,1,'1980-07-01','1980-07-03' )
,(1,1,'1980-07-02','1980-07-04' ) -- Notice the Overlap
,(1,2,'2008-01-30','2008-02-05')
Select *
from @YourTable A
Cross Apply (
Select Top (DateDiff(DAY,A.[StartDate],A.[EndDate])+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),A.[StartDate])
From master..spt_values n1,master..spt_values n2
) B
以上查询将生成
2)Grp部分:如果我提供一个简单的例子,也许更容易:
Declare @YourTable Table ([PersonID] int,[LocationID] int,[StartDate] date,[EndDate] date)
Insert Into @YourTable Values
(1,1,'1980-07-01','1980-07-03' )
,(1,1,'1980-07-02','1980-07-04' ) -- Notice the Overlap
,(1,2,'2008-01-30','2008-02-05')
Select *
,Grp = Row_Number() over (Order By D) - Row_Number() over (Partition By [PersonID],[LocationID] Order By D)
,RN1 = Row_Number() over (Order By D)
,RN2 = Row_Number() over (Partition By [PersonID],[LocationID] Order By D)
from @YourTable A
Cross Apply (
Select Top (DateDiff(DAY,A.[StartDate],A.[EndDate])+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),A.[StartDate])
From master..spt_values n1,master..spt_values n2
) B
以上查询生成:
RN1和RN2是GRP的突破,只是为了说明机制。注意RN1减去RN2等于GRP。一旦我们拥有了GRP,通过
组成为一个简单的聚合问题3)全力拉拢:
Declare @YourTable Table ([PersonID] int,[LocationID] int,[StartDate] date,[EndDate] date)
Insert Into @YourTable Values
(1,1,'1980-07-01','1980-07-03' )
,(1,1,'1980-07-02','1980-07-04' ) -- Notice the Overlap
,(1,2,'2008-01-30','2008-02-05')
Select [PersonID]
,[LocationID]
,[StartDate] = min(D)
,[EndDate] = max(D)
From (
Select *
,Grp = Row_Number() over (Order By D) - Row_Number() over (Partition By [PersonID],[LocationID] Order By D)
from @YourTable A
Cross Apply (
Select Top (DateDiff(DAY,A.[StartDate],A.[EndDate])+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),A.[StartDate])
From master..spt_values n1,master..spt_values n2
) B
) G
Group By [PersonID],[LocationID],Grp
Order By [PersonID],min(D)
返回