Question

我有一张包含就业日期的表格。每次发生变化时都会添加一个新行 - 薪水变化是最常见的变化。所以新行将等于该人的最后一个TO-date加一（1）。如果我的工资发生变化2014-04-01，我的上一行将在TO中结束其2013-03-31日期，我的新行将FROM与2014-04-01开始。

我希望只获得就业日期，而不是因更改而获得的日期。看看这张表：

SSN FROM TO ---------------------------------- 0987654321 2011-01-01 2011-12-31 0987654321 2012-01-01 2012-12-31 1234567890 2012-01-01 2012-12-31 0987654321 2013-01-01 2013-12-31 1234567890 2013-01-01 2013-06-30 0987654321 2014-01-01 2014-08-31 1234567890 2016-01-01 2016-12-31 1234567890 2017-01-01 2017-12-31 1234567890 2018-01-01 null

我想要的输出：

SSN FROM TO ---------------------------------- 0987654321 2011-01-01 2014-08-31 1234567890 2012-01-01 2013-06-30 1234567890 2016-01-01 null

我以为我可以创建一个比TO更多的字段：

SELECT SSN, TO, FROM, DATEADD(DAY, 1, TO) AS NEW FROM table

但是我不知道如何在不同的行上将NEW与TO匹配。也许是WHERE NOT EXISTS或其他什么？我不能让它发挥作用。

然后我想我可以使用LAG但是默认情况下表格中的前一行与下一行无关，我无法在子查询中使用ORDER BY。我不允许它，不知道为什么（T-SQL？）。

仅供参考，我不能CREATE TABLE或INSERT INTO TABLE等，我也不能声明变量。我们将获得一个允许所有这些的模块，但是现在我没有这些权限。

更新第一个答案实际上是正确的，但我注意到另一个干扰它的字段。一个SSN可以包含多个ID，因此ID也必须分开。这是我桌子上的实际数据。

CREATE TABLE Samples ( SSN varchar(10), ID varchar(4), FromDate Date, ToDate Date ); INSERT INTO Samples (SSN, ID, FromDate, ToDate) VALUES ( '6612140000', '1000', '2005-01-01', '2005-03-31' ), ( '6612140000', '1000', '2005-04-01', '2005-09-30' ), ( '6612140000', '1000', '2005-10-01', '2006-03-31' ), ( '6612140000', '2000', '2005-10-01', '2006-04-30' ), ( '6612140000', '1000', '2006-04-01', '2007-03-31' ), ( '6612140000', '1000', '2007-04-01', '2008-03-31' ), ( '6612140000', '1000', '2008-04-01', '2009-03-31' ), ( '6612140000', '1000', '2009-04-01', '2010-03-31' ), ( '6612140000', '1000', '2010-04-01', '2010-11-30' ), ( '6612140000', '1000', '2010-12-01', '2011-03-31' ), ( '6612140000', '1000', '2011-04-01', '2011-08-21' ), ( '6612140000', '1000', '2011-08-22', '2011-11-13' ), ( '6612140000', '1000', '2011-11-14', '2011-11-30' ), ( '6612140000', '1000', '2011-12-01', '2012-01-31' ), ( '6612140000', '1000', '2016-07-01', '2017-03-31' ), ( '6612140000', '1000', '2017-04-01', '2017-11-30' ), ( '6612140000', '1000', '2017-12-01', '2018-03-31' ), ( '6612140000', '1000', '2018-04-01', null ), ( '7605140000', '1000', '2013-11-01', '2013-11-30' ), ( '7605140000', '1000', '2013-12-01', '2013-12-31' ), ( '7605140000', '1000', '2014-01-01', '2014-03-31' ), ( '7605140000', '1000', '2014-04-01', '2014-12-31' ), ( '7605140000', '1000', '2015-05-01', '2015-05-31' ), ( '7605140000', '1000', '2015-06-01', '2015-09-30' ), ( '7605140000', '1000', '2015-10-01', '2015-10-31' ), ( '7605140000', '1000', '2016-01-25', '2016-07-24' ), ( '7605140000', '1000', '2016-07-25', '2016-08-31' ), ( '7605140000', '1000', '2016-09-01', '2017-03-31' ), ( '7605140000', '1000', '2017-04-01', '2017-11-30' ), ( '7605140000', '1000', '2017-12-01', null );

答案中的代码，我尝试添加ID字段，但没有运气：

with FromDates as ( -- All of the FromDates for each SSN for which there is not -- a contiguous preceding period. select SO.SSN, SO.ID, SO.FromDate, SO.ToDate, Row_Number() over ( partition by SO.SSN order by SO.FromDate ) as RN from Samples as SO where not exists ( select 42 from Samples as SI where SI.SSN = SO.SSN and SI.ID = SO.ID and SI.ToDate = DateAdd( day, -1, SO.FromDate ) ) ), ToDates as ( -- All of the ToDates for each SSN for which there is not -- a contiguous following period. select SSN, ID, FromDate, ToDate, Row_Number() over ( partition by SSN order by FromDate ) as RN from Samples as SO where not exists ( select 42 from Samples as SI where SI.SSN = SO.SSN and SI.ID = SO.ID and SI.FromDate = DateAdd( day, 1, SO.ToDate ) ) ), Ranges as ( -- Pair the FromDate and ToDate entries for each SSN . select F.SSN, F.ID, F.FromDate, T.ToDate from FromDates as F inner join ToDates as T on T.SSN = F.SSN and T.ID = F.ID and T.RN = F.RN ) -- Use any ONE of the following select statements to see what is going on: -- select * from FromDates -- select * from ToDates select * from Ranges -- where SSN = '6612140000' order by SSN, ID, FromDate

返回：

SSN ID FromDate ToDate 6612140000 1000 2016-07-01 (null) 7605140000 1000 2013-11-01 2014-12-31 7605140000 1000 2014-03-01 2014-12-31 7605140000 1000 2015-05-01 2015-10-31 7605140000 1000 2015-05-01 2015-10-31 7605140000 1000 2016-01-25 (null)

Answer 1

以下示例根据您的数据汇总岛屿。通过更改启用/注释的最终select语句中的哪一个，您可以在流程中看到中间结果。

更新：更改了CTE中的日期比较，以便他们可以从SSN, FromDate和SSN, ToDate上的索引中受益。

-- Sample data.
declare @Samples table ( SSN VarChar(10), FromDate Date, ToDate Date );
insert into @Samples ( SSN, FromDate, ToDate ) values
  ( '0987654321', '2011-01-01', '2011-12-31' ),
  ( '0987654321', '2012-01-01', '2012-12-31' ),
  ( '1234567890', '2012-01-01', '2012-12-31' ),
  ( '0987654321', '2013-01-01', '2013-12-31' ),
  ( '1234567890', '2013-01-01', '2013-06-30' ),
  ( '0987654321', '2014-01-01', '2014-08-31' ),
  ( '1234567890', '2016-01-01', '2016-12-31' ),
  ( '1234567890', '2017-01-01', '2017-12-31' ),
  ( '1234567890', '2018-01-01', null );
select *
  from @Samples;

-- Sample data made a little easier to read.
select *,
  case when exists (
    select 42 from @Samples as SI where SI.SSN = S.SSN and
      DateDiff( day, S.ToDate, SI.FromDate ) = 1 ) then 1 else 0 end as Continued
  from @Samples as S
  order by SSN, FromDate;

-- Process the data.
with
  FromDates as (
    -- All of the   FromDates   for each   SSN   for which there is not
    --   a contiguous preceding period.
    select SO.SSN, SO.FromDate, SO.ToDate,
      Row_Number() over ( partition by SO.SSN order by SO.FromDate ) as RN
      from @Samples as SO
      where not exists (
        select 42 from @Samples as SI where SI.SSN = SO.SSN and
          SI.ToDate = DateAdd( day, -1, SO.FromDate ) ) ),
  ToDates as (
    -- All of the   ToDates   for each   SSN   for which there is not
    --   a contiguous following period.
    select SSN, FromDate, ToDate, Row_Number() over ( partition by SSN order by FromDate ) as RN
      from @Samples as SO
      where not exists (
        select 42 from @Samples as SI where SI.SSN = SO.SSN and
          SI.FromDate = DateAdd( day, 1, SO.ToDate ) ) ),
  Ranges as (
    -- Pair the   FromDate   and   ToDate   entries for each   SSN .
    select F.SSN, F.FromDate, T.ToDate
      from FromDates as F inner join
        ToDates as T on T.SSN = F.SSN and T.RN = F.RN )
  -- Use any ONE of the following   select   statements to see what is going on:
--  select * from FromDates order by SSN, FromDate;
--  select * from ToDates order by SSN, FromDate;
  select * from Ranges order by SSN, FromDate;

当然，如果Id内的SSN个值实际上是独立处理的，那么答案就会改为：

-- Sample data.
declare @Samples as Table ( SSN VarChar(10), Id VarChar(4), FromDate Date, ToDate Date );
insert into @Samples ( SSN, ID, FromDate, ToDate ) values
    ( '6612140000', '1000', '2005-01-01', '2005-03-31' ),
    ( '6612140000', '1000', '2005-04-01', '2005-09-30' ),
    ( '6612140000', '1000', '2005-10-01', '2006-03-31' ), 
    ( '6612140000', '2000', '2005-10-01', '2006-04-30' ),
    ( '6612140000', '1000', '2006-04-01', '2007-03-31' ),
    ( '6612140000', '1000', '2007-04-01', '2008-03-31' ),
    ( '6612140000', '1000', '2008-04-01', '2009-03-31' ),
    ( '6612140000', '1000', '2009-04-01', '2010-03-31' ),
    ( '6612140000', '1000', '2010-04-01', '2010-11-30' ),
    ( '6612140000', '1000', '2010-12-01', '2011-03-31' ),
    ( '6612140000', '1000', '2011-04-01', '2011-08-21' ),
    ( '6612140000', '1000', '2011-08-22', '2011-11-13' ),
    ( '6612140000', '1000', '2011-11-14', '2011-11-30' ),
    ( '6612140000', '1000', '2011-12-01', '2012-01-31' ),
    ( '6612140000', '1000', '2016-07-01', '2017-03-31' ),
    ( '6612140000', '1000', '2017-04-01', '2017-11-30' ),
    ( '6612140000', '1000', '2017-12-01', '2018-03-31' ),
    ( '6612140000', '1000', '2018-04-01', null ),
    ( '7605140000', '1000', '2013-11-01', '2013-11-30' ),
    ( '7605140000', '1000', '2013-12-01', '2013-12-31' ),
    ( '7605140000', '1000', '2014-01-01', '2014-03-31' ),
    ( '7605140000', '1000', '2014-03-01', '2014-12-31' ),
    ( '7605140000', '1000', '2014-04-01', '2014-12-31' ),
    ( '7605140000', '1000', '2015-05-01', '2015-05-31' ),
--  ( '7605140000', '1000', '2015-05-01', '2015-05-31' ), -- Duplicate row?!
    ( '7605140000', '1000', '2015-06-01', '2015-09-30' ),
--  ( '7605140000', '1000', '2015-06-01', '2015-09-30' ), -- Duplicate row?!
    ( '7605140000', '1000', '2015-10-01', '2015-10-31' ),
--  ( '7605140000', '1000', '2015-10-01', '2015-10-31' ), -- Duplicate row?!
    ( '7605140000', '1000', '2016-01-25', '2016-07-24' ),
    ( '7605140000', '1000', '2016-07-25', '2016-08-31' ),
    ( '7605140000', '1000', '2016-09-01', '2017-03-31' ),
    ( '7605140000', '1000', '2017-04-01', '2017-11-30' ),
    ( '7605140000', '1000', '2017-12-01', null );
select *
  from @Samples;

-- Sample data made a little easier to read.
select *,
  case when exists (
    select 42 from @Samples as SI where SI.SSN = S.SSN and SI.Id = S.Id and
      DateDiff( day, S.ToDate, SI.FromDate ) = 1 ) then 1 else 0 end as Continued
  from @Samples as S
  order by SSN, Id, FromDate;

-- Process the data.
with
  FromDates as (
    -- All of the   FromDates   for each   SSN   for which there is not
    --   a contiguous preceding period.
    select SO.SSN, SO.Id, SO.FromDate, SO.ToDate,
      Row_Number() over ( partition by SO.SSN, SO.Id order by SO.FromDate ) as RN
      from @Samples as SO
      where not exists (
        select 42 from @Samples as SI where SI.SSN = SO.SSN and SI.Id = SO.Id and
          SI.ToDate = DateAdd( day, -1, SO.FromDate ) ) ),
  ToDates as (
    -- All of the   ToDates   for each   SSN   for which there is not
    --   a contiguous following period.
    select SO.SSN, SO.Id, SO.FromDate, SO.ToDate,
      Row_Number() over ( partition by SSN, SO.Id order by FromDate ) as RN
      from @Samples as SO
      where not exists (
        select 42 from @Samples as SI where SI.SSN = SO.SSN and SI.Id = SO.Id and
          SI.FromDate = DateAdd( day, 1, SO.ToDate ) ) ),
  Ranges as (
    -- Pair the   FromDate   and   ToDate   entries for each   SSN .
    select F.SSN, F.Id, F.FromDate, T.ToDate
      from FromDates as F inner join
        ToDates as T on T.SSN = F.SSN and T.Id = F.Id and T.RN = F.RN )
  -- Use any ONE of the following   select   statements to see what is going on:
--  select * from FromDates order by SSN, Id, FromDate;
--  select * from ToDates order by SSN, Id, FromDate;
  select * from Ranges order by SSN, Id, FromDate;

Answer 2

这是一个差距＆amp;孤岛问题，标准解决方案基于嵌套分析函数：

＃1：将每一行与上一行进行比较，并在新组启动时将其标记为1。

＃2：计算标志的累积总和，为每组行分配一个数字。

＃3：现在你可以用这些小组做任何你想做的事。

-- data must be correct, i.e. a Slowly Changing Dimension without gaps or overlapping periods
with calcFlag as
 (
   select SSN, Id, FromDate, ToDate,
      -- new group starts when the previous end date
      -- is not the current start date -1
      case when lag(ToDate)
                over (partition by SSN, Id
                      order by FromDate ) = DateAdd( day, -1, FromDate )
           then 0
           else 1
      end as flag
   from samples
 ),
calcGroup as 
 (
   select SSN, Id, FromDate, ToDate, flag,
      -- Cumulative Sum to dynamically assign group number
      sum(flag)
      over ( partition by SSN, Id 
             order by FromDate 
             rows unbounded preceding ) as grp#
   from calcFlag
 )
select SSN, Id, 
   min(FromDate), 
   -- either max date or NULL 
   nullif(max(coalesce(ToDate, '9999-12-31')), '9999-12-31')
from calcGroup
group by SSN, Id, grp# -- include dynamically calculated group number
order by SSN, Id, min(FromDate)
;

查找第一次出现的连续开始/结束列

2 个答案: