Question

我的情况是我有一个交易表，包含开始和结束日期。问题是这些交易日期经常相互重叠，我想将这些场景组合在一起。

例如，在下面的情况中，交易＃1是＆＃34; root＆＃34; 交易，而＃2-＃4与＃1和/或彼此重叠。但是，事务＃5与任何事物都不重叠，因此它是一个新的＆＃34; root＆＃34;事务。

+----------------+-----------+-----------+----------------------------------+
| Transaction ID | StartDate |  EndDate  |                                  |
+----------------+-----------+-----------+----------------------------------+
|              1 | 1/1/2017  | 1/3/2017  | root transaction                 |
|              2 | 1/2/2017  | 1/6/2017  | overlaps with #1                 |
|              3 | 1/5/2017  | 1/10/2017 | overlaps with #2                 |
|              4 | 1/3/2017  | 1/13/2017 | overlaps with #2 and #3          |
|              5 | 1/15/2017 | 1/20/2017 | no overlap, new root transaction |
+----------------+-----------+-----------+----------------------------------+

以下是我希望输出的外观。我想

确定根事务（第4列）
通过EndDate对链中的事务进行排名，以便根始终为= 1

+----------------+-----------+-----------+------------------+------+
| Transaction ID |   Start   |    End    | Root Transaction | Rank |
+----------------+-----------+-----------+------------------+------+
|              1 | 1/1/2017  | 1/3/2017  |                1 |    1 |
|              2 | 1/2/2017  | 1/6/2017  |                1 |    2 |
|              3 | 1/5/2017  | 1/10/2017 |                1 |    3 |
|              4 | 1/3/2017  | 1/13/2017 |                1 |    4 |
|              5 | 1/15/2017 | 1/20/2017 |                5 |    1 |
+----------------+-----------+-----------+------------------+------+

我如何在SQL中解决这个问题？

Answer 1

以下是使用OUTER APPLY

的一种方法

Declare @YourTable table ([Transaction ID] int,StartDate date,EndDate date)
Insert Into @YourTable values
(1,'1/1/2017','1/3/2017'),
(2,'1/2/2017','1/6/2017'),
(3,'1/5/2017','1/10/2017'),
(4,'1/3/2017','1/13/2017'),
(5,'1/15/2017','1/20/2017')

Select [Transaction ID]
      ,[Start] = StartDate
      ,[End]   = EndDate
      ,[Root Transaction]=Grp
      ,[Rank]  = Row_Number() over (Partition By Grp Order by [Transaction ID])
 From (
        Select A.*
              ,Grp = max(Flag*[Transaction ID]) over (Order By [Transaction ID])
         From (
                Select A.*,Flag = IsNull(B.Flg,1)
                 From @YourTable A
                 Outer Apply (
                              Select Top 1 Flg=0 
                               From  @YourTable 
                               Where (StartDate between A.StartDate and A.EndDate 
                                  or EndDate   between A.StartDate and A.EndDate )
                                 and [Transaction ID]<A.[Transaction ID]
                              ) B
              ) A
      ) A

返回

编辑 - 一些评论

在OUTER APPLY中，Flag将设置为1或0. 1表示新组。 0表示记录将与现有范围重叠

然后下一个查询＆＃34; up＆＃34;，我们使用窗口函数来应用Grp代码（Flag * Trans ID）。请记住，新组为1，现有为0 现在，窗口函数将占用此产品的最大值，因为它遍历事务。

最后的查询只是使用Grp的窗口函数分区来应用Rank，按Trans ID排序

如果它有助于可视化：

第一个子查询（外部申请）生成

第二个子查询生成

Answer 2

这是“差距和岛屿”的一个例子。对于您的数据，您可以通过确定每个开始的位置来确定“孤岛” - 也就是说，记录与前一个记录不重叠的位置。然后，您可以使用row_number()获得排名。

所以，这是一个方法：

select t.*,
       min(transactionId) over (partition by island) as start,
       row_number() over (partition by island order by endDate) as rnk
from (select t.*,
             sum(startIslandFlag) over (order by startDate) as island
      from (select t.*,
                   (case when not exists (select 1
                                          from t t2
                                          where t2.startdate < t.startdate and
                                                t2.enddate >= t.startdate
                                         )
                         then 1 else 0
                    end) as startIslandFlag
            from t
           ) t
      ) t;

注意：

如果最低交易ID不是“root”，则可能需要对代码进行调整以获得具有最小开始日期的交易ID。
如果代码中有重复的开始日期，则可能需要对累积总和进行调整（使用明确的range窗口）。

Answer 3

识别根事务：

with roots as (
    select *
    from tran as t1
    where not exists (
        select 1
        from tran as t2
        where t2.Transaction_ID < t1.Transaction_ID
        and (
            t1.StartDate between t2.StartDate and t2.EndDate
            or
            t1.EndDate between t2.StartDate and t2.EndDate
            )
        )
    )

创建一个双根系统来捕获它们之间的所有重叠

select t.Transaction_ID,
    t.StartDate as [Start],
    t.EndDate as [End],
    r1.Transaction_ID as Root_Transaction,
    row_number() over (partition by r1.Transaction_ID order by t.EndDate) as [Rank]
from roots as r1
inner join roots as r2
on r2.Transaction_ID > r1.Transaction_ID
inner join tran as t
on t.Transaction_ID >= r1.Transaction_ID
and t.Transaction_ID < r2.Transaction_ID 
where not exists ( --this "not exists" makes sure r1 and r2 are consequetive roots
    select 1
    from roots as r3
    where r3.Transaction_ID > r1.Transaction_ID
    and r3.Transaction_ID < r2.Transaction_ID
    )

如果日期重叠，则对行进行分组，并对它们进行排名

3 个答案: