将连续的类似记录折叠为单个记录

时间:2017-05-26 15:57:25

标签: sql sql-server tsql sql-server-2008-r2

我有来自旧系统的人员的记录,我试图将其转换为新系统。在旧系统中,一个人最终可能会在同一位置找到几条记录。他们也可以从位置到另一个,然后返回到以前的位置。以下是一些示例数据:

PersonID  | LocationID | StartDate  | EndDate
1         | 1          | 1980-07-30 | 2007-07-16
1         | 1          | 2007-07-16 | 2008-01-30
1         | 2          | 2008-01-30 | 2009-03-02
1         | 2          | 2009-03-02 | 2009-11-06
1         | 3          | 2014-07-16 | 2015-01-16
1         | 1          | 2016-01-26 | 2999-12-31

我想折叠这些数据,以便获得任何连续LocationID的日期范围。对于上面的数据,这是我所期望的:

PersonID  | LocationID | StartDate  | EndDate
1         | 1          | 1980-07-30 | 2008-01-30
1         | 2          | 2008-01-30 | 2009-11-06
1         | 3          | 2014-07-16 | 2015-01-16
1         | 1          | 2016-01-26 | 2999-12-31

我不确定如何做到这一点。我以前曾尝试加入上一条记录,但只有在连续两个位置时才有效,而不是3个或更多(可能有不确定数量的连续记录)。

select
    a.PersonID,
    a.LocationID,
    a.StartDate,
    a.EndDate,
    case when a.LocationID = b.LocationID then a.PK_ID else b.PK_ID end as NewID
from employees a
left outer join employees b
on a.PersonID = b.PersonID
and a.PK_ID = b.PK_ID - 1

那么,如何编写查询以获得我需要的结果呢?

注意:我们正在治疗2999-12-31'是我们的“空”'日期字段

2 个答案:

答案 0 :(得分:1)

对于样本数据,您可以使用行号方法的差异:

select personid, locationid, min(startdate), max(enddate)
from (select e.*,
             row_number() over (partition by personid order by startdate) as seqnum_p,
             row_number() over (partition by personid, locationid order by startdate) as seqnum_pl
      from employees  e
     ) e
group by (seqnum_p - seqnum_pl), personid, locationid;

这假定开始日期和结束日期是连续的。也就是说,同一地点的特定员工没有差距。

答案 1 :(得分:1)

这是一个经典的Gaps-and-Islands (编辑 - 纠正更大的跨度2999)

Select [PersonID]
      ,[LocationID]
      ,[StartDate]  = min(D)
      ,[EndDate]    = max(D)
 From (
        Select *
              ,Grp = Row_Number() over (Order By D) - Row_Number() over (Partition By [PersonID],[LocationID] Order By D) 
         from YourTable A
         Cross Apply (
                        Select Top (DateDiff(DAY,A.[StartDate],A.[EndDate])+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),A.[StartDate])  
                        From  master..spt_values n1,master..spt_values n2
                     ) B
      ) G
 Group By [PersonID],[LocationID],Grp
 Order By [PersonID],min(D)

<强>返回

PersonID    LocationID  StartDate   EndDate
1           1           1980-07-30  2008-01-30
1           2           2008-01-30  2009-11-06
1           3           2014-07-16  2015-01-16
1           1           2016-01-26  2999-12-31

使用原始查询

Select [PersonID]
      ,[LocationID]
      ,[StartDate]  = min(D)
      ,[EndDate]    = max(D)
 From (
        Select *
              ,Grp = Row_Number() over (Order By D) - Row_Number() over (Partition By [PersonID],[LocationID] Order By D) 
         From (
                -- Your Original Query
                select
                    a.PersonID,
                    a.LocationID,
                    a.StartDate,
                    a.EndDate,
                    case when a.LocationID = b.LocationID then a.PK_ID else b.PK_ID end as NewID
                from employees a
                left outer join employees b
                on a.PersonID = b.PersonID
                and a.PK_ID = b.PK_ID - 1
              ) A
         Cross Apply (
                        Select Top (DateDiff(DAY,A.[StartDate],A.[EndDate])+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),A.[StartDate])  
                        From  master..spt_values n1,master..spt_values n2
                     ) B
      ) G
 Group By [PersonID],[LocationID],Grp
 Order By [PersonID],min(D)

请求评论

让我们把它分解成它的组成部分。

1)CROSS APPLY部分:这会将单个记录扩展为N条记录。例如:

Declare @YourTable Table ([PersonID] int,[LocationID] int,[StartDate] date,[EndDate] date)
Insert Into @YourTable Values
 (1,1,'1980-07-01','1980-07-03' )
,(1,1,'1980-07-02','1980-07-04' )  -- Notice the Overlap
,(1,2,'2008-01-30','2008-02-05')

Select *
    from @YourTable A
    Cross Apply (
                Select Top (DateDiff(DAY,A.[StartDate],A.[EndDate])+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),A.[StartDate])  
                From  master..spt_values n1,master..spt_values n2
                ) B

以上查询将生成

enter image description here

2)Grp部分:如果我提供一个简单的例子,也许更容易:

Declare @YourTable Table ([PersonID] int,[LocationID] int,[StartDate] date,[EndDate] date)
Insert Into @YourTable Values
 (1,1,'1980-07-01','1980-07-03' )
,(1,1,'1980-07-02','1980-07-04' )  -- Notice the Overlap
,(1,2,'2008-01-30','2008-02-05')

Select *
      ,Grp = Row_Number() over (Order By D) - Row_Number() over (Partition By [PersonID],[LocationID] Order By D) 
      ,RN1 = Row_Number() over (Order By D)
      ,RN2 = Row_Number() over (Partition By [PersonID],[LocationID] Order By D) 
    from @YourTable A
    Cross Apply (
                Select Top (DateDiff(DAY,A.[StartDate],A.[EndDate])+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),A.[StartDate])  
                From  master..spt_values n1,master..spt_values n2
                ) B

以上查询生成:

enter image description here

RN1和RN2是GRP的突破,只是为了说明机制。注意RN1减去RN2等于GRP。一旦我们拥有了GRP,通过

组成为一个简单的聚合问题

3)全力拉拢:

Declare @YourTable Table ([PersonID] int,[LocationID] int,[StartDate] date,[EndDate] date)
Insert Into @YourTable Values
 (1,1,'1980-07-01','1980-07-03' )
,(1,1,'1980-07-02','1980-07-04' )  -- Notice the Overlap
,(1,2,'2008-01-30','2008-02-05')

Select [PersonID]
      ,[LocationID]
      ,[StartDate]  = min(D)
      ,[EndDate]    = max(D)
 From (
        Select *
              ,Grp = Row_Number() over (Order By D) - Row_Number() over (Partition By [PersonID],[LocationID] Order By D) 
            from @YourTable A
            Cross Apply (
                        Select Top (DateDiff(DAY,A.[StartDate],A.[EndDate])+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),A.[StartDate])  
                        From  master..spt_values n1,master..spt_values n2
                        ) B
      ) G
 Group By [PersonID],[LocationID],Grp
 Order By [PersonID],min(D)

返回

enter image description here