如果日期重叠,则对行进行分组,并对它们进行排名

时间:2017-01-05 21:29:56

标签: sql sql-server tsql sql-server-2016

我的情况是我有一个交易表,包含开始和结束日期。问题是这些交易日期经常相互重叠,我想将这些场景组合在一起。

例如,在下面的情况中,交易#1是" root" 交易,而#2-#4与#1和/或彼此重叠。但是,事务#5与任何事物都不重叠,因此它是一个新的" root"事务。

+----------------+-----------+-----------+----------------------------------+
| Transaction ID | StartDate |  EndDate  |                                  |
+----------------+-----------+-----------+----------------------------------+
|              1 | 1/1/2017  | 1/3/2017  | root transaction                 |
|              2 | 1/2/2017  | 1/6/2017  | overlaps with #1                 |
|              3 | 1/5/2017  | 1/10/2017 | overlaps with #2                 |
|              4 | 1/3/2017  | 1/13/2017 | overlaps with #2 and #3          |
|              5 | 1/15/2017 | 1/20/2017 | no overlap, new root transaction |
+----------------+-----------+-----------+----------------------------------+

以下是我希望输出的外观。我想

  1. 确定根事务(第4列)
  2. 通过EndDate对链中的事务进行排名,以便根始终为= 1
  3. +----------------+-----------+-----------+------------------+------+
    | Transaction ID |   Start   |    End    | Root Transaction | Rank |
    +----------------+-----------+-----------+------------------+------+
    |              1 | 1/1/2017  | 1/3/2017  |                1 |    1 |
    |              2 | 1/2/2017  | 1/6/2017  |                1 |    2 |
    |              3 | 1/5/2017  | 1/10/2017 |                1 |    3 |
    |              4 | 1/3/2017  | 1/13/2017 |                1 |    4 |
    |              5 | 1/15/2017 | 1/20/2017 |                5 |    1 |
    +----------------+-----------+-----------+------------------+------+
    

    我如何在SQL中解决这个问题?

3 个答案:

答案 0 :(得分:3)

以下是使用OUTER APPLY

的一种方法
Declare @YourTable table ([Transaction ID] int,StartDate date,EndDate date)
Insert Into @YourTable values
(1,'1/1/2017','1/3/2017'),
(2,'1/2/2017','1/6/2017'),
(3,'1/5/2017','1/10/2017'),
(4,'1/3/2017','1/13/2017'),
(5,'1/15/2017','1/20/2017')

Select [Transaction ID]
      ,[Start] = StartDate
      ,[End]   = EndDate
      ,[Root Transaction]=Grp
      ,[Rank]  = Row_Number() over (Partition By Grp Order by [Transaction ID])
 From (
        Select A.*
              ,Grp = max(Flag*[Transaction ID]) over (Order By [Transaction ID])
         From (
                Select A.*,Flag = IsNull(B.Flg,1)
                 From @YourTable A
                 Outer Apply (
                              Select Top 1 Flg=0 
                               From  @YourTable 
                               Where (StartDate between A.StartDate and A.EndDate 
                                  or EndDate   between A.StartDate and A.EndDate )
                                 and [Transaction ID]<A.[Transaction ID]
                              ) B
              ) A
      ) A

返回

enter image description here

  

编辑 - 一些评论

OUTER APPLY中,Flag将设置为1或0. 1表示新组。 0表示记录将与现有范围重叠

然后下一个查询&#34; up&#34;,我们使用窗口函数来应用Grp代码(Flag * Trans ID)。请记住,新组为1,现有为0  现在,窗口函数将占用此产品的最大值,因为它遍历事务。

最后的查询只是使用Grp的窗口函数分区来应用Rank,按Trans ID排序

如果它有助于可视化:

第一个子查询(外部申请)生成

enter image description here

第二个子查询生成

enter image description here

答案 1 :(得分:1)

这是“差距和岛屿”的一个例子。对于您的数据,您可以通过确定每个开始的位置来确定“孤岛” - 也就是说,记录与前一个记录不重叠的位置。然后,您可以使用row_number()获得排名。

所以,这是一个方法:

select t.*,
       min(transactionId) over (partition by island) as start,
       row_number() over (partition by island order by endDate) as rnk
from (select t.*,
             sum(startIslandFlag) over (order by startDate) as island
      from (select t.*,
                   (case when not exists (select 1
                                          from t t2
                                          where t2.startdate < t.startdate and
                                                t2.enddate >= t.startdate
                                         )
                         then 1 else 0
                    end) as startIslandFlag
            from t
           ) t
      ) t;

注意:

  • 如果最低交易ID不是“root”,则可能需要对代码进行调整以获得具有最小开始日期的交易ID。
  • 如果代码中有重复的开始日期,则可能需要对累积总和进行调整(使用明确的range窗口)。

答案 2 :(得分:1)

识别根事务:

with roots as (
    select *
    from tran as t1
    where not exists (
        select 1
        from tran as t2
        where t2.Transaction_ID < t1.Transaction_ID
        and (
            t1.StartDate between t2.StartDate and t2.EndDate
            or
            t1.EndDate between t2.StartDate and t2.EndDate
            )
        )
    )

创建一个双根系统来捕获它们之间的所有重叠

select t.Transaction_ID,
    t.StartDate as [Start],
    t.EndDate as [End],
    r1.Transaction_ID as Root_Transaction,
    row_number() over (partition by r1.Transaction_ID order by t.EndDate) as [Rank]
from roots as r1
inner join roots as r2
on r2.Transaction_ID > r1.Transaction_ID
inner join tran as t
on t.Transaction_ID >= r1.Transaction_ID
and t.Transaction_ID < r2.Transaction_ID 
where not exists ( --this "not exists" makes sure r1 and r2 are consequetive roots
    select 1
    from roots as r3
    where r3.Transaction_ID > r1.Transaction_ID
    and r3.Transaction_ID < r2.Transaction_ID
    )