如何确定SQL Server“查找孤岛”中的连续日期计数/天(连续的行)

时间:2018-07-07 17:32:02

标签: sql sql-server tsql sql-server-2012-express

我有一张学生桌,我想知道他们的课程/培训持续了多长时间。 我想排除周末,但我想计算不包括周末的连续天数。 班级有开始日期和结束日期,例如,学生ID S1 a可以在1月预订课程,然后在2月再次预订课程,我想知道1月预订和2月预订的天数,但不包括周末。基本上,我正在按学生ID查找从“开始日期”到“结束日期”的连续日期,除周末外没有休息时间。

SELECT 
 [ID]
,[StartDate]
,[EndDate]
,[BookingDays] AS Consecutive_Booking
FROM StudentBooking

如果该学生(学生分类(类型))在过去3个月内预订了5天或2次课程(开始日期至结束日期(星期一至星期五)),则他们是常住访客。开始日期和结束日期仅记录到星期一至星期五。 请注意,学生证1的日期是连续的,因此应将其计为冻结日期。 (02/01 / 2018-12 / 01/2018)第二街区22 / 01-26 / 01

我想复制下表。

ID   StartDate  EndDate     Duration     Type
1   02/01/2018  05/01/2018              ==>Please Note have continous dates
1   08/01/2018  12/01/2018   9           Resident
1   22/01/2018  26/01/2018   5           Resident 
2   23/01/2018  26/01/2018   4           Visitor
3   29/01/2018  31/01/2018   3           Visitor

1 个答案:

答案 0 :(得分:0)

这是我解决您问题的方法。

在CTE的“比较”中,我将所有记录与该学生的所有记录及其后的所有记录合并在一起。这样,我有一个连续训练块的可能起点(从连接的左侧开始),还有一个这样的块的可能终点(从连接的右侧开始)。 使用“交叉应用”,我计算出两个值:

  1. 从可能的第一个间隔开始到可能的最后一个间隔结束的工作日
  2. 工作日恰好是链中最后一个可能的时间间隔。

在后一个值上,使用Windows函数,我从可能的开始和结束时间间隔构建了一个工作日的运行总计。 您用“ SQL 2012”标记了该问题,因此应该可以使用此窗口功能。

在下一个CTE(“排序”)中,我将先前的结果限制为那些运行总计等于第一个开始日期和最后一个结束日期之间的工作日的结果。这样,仅剩下连续的块。然后以两种方式对它们进行编号:

  1. 具有相同EndDate的连续块通过将StartDate升序进行编号
  2. 具有相同StartDate的连续块通过EndDate降序编号。

对于每个EndDate,我想要最早的StartDate,对于这个StartDate,我想要仅最新的EndDate,因此我在两个编号中都过滤了1。在这里:

WITH
  comparison (ID, StartDate, EndDate, TotalDays, SumSingleDays) AS (
    SELECT bStart.ID, bStart.StartDate, bEnd.EndDate, Workdays.Total
      , SUM(Workdays.Single) OVER (
          PARTITION BY bStart.ID, bStart.StartDate 
          ORDER BY bEnd.StartDate
          ROWS UNBOUNDED PRECEDING)
    FROM StudentBookings bStart
      INNER JOIN StudentBookings bEnd 
        ON bStart.ID = bEnd.ID AND bStart.StartDate <= bEnd.StartDate
      CROSS APPLY (VALUES (
        DATEDIFF(day, 0, bStart.StartDate), 
        DATEDIFF(day, 0, bEnd.StartDate), 
        1+DATEDIFF(day, 0, bEnd.EndDate))
      ) d (s1, s2, e2)
      CROSS APPLY (VALUES (
        (d.e2 - d.s1) - (d.e2/7 - d.s1/7) - ((d.e2+1)/7 - (d.s1+1)/7),
        (d.e2 - d.s2) - (d.e2/7 - d.s2/7) - ((d.e2+1)/7 - (d.s2+1)/7))
      ) Workdays (Total, Single)
  ),
  sorting (ID, StartDate, EndDate, Duration, RowNumStart, RowNumEnd) AS (
    SELECT ID, StartDate, EndDate, TotalDays
      , ROW_NUMBER() OVER (PARTITION BY ID, EndDate ORDER BY StartDate)
      , ROW_NUMBER() OVER (PARTITION BY ID, StartDate ORDER BY EndDate DESC)
    FROM comparison
    WHERE TotalDays = SumSingleDays
  )
SELECT ID, StartDate, EndDate, Duration
  , CASE WHEN Duration >= 5 THEN 'Resident' ELSE 'Visitor' END AS [Type]
FROM sorting 
WHERE (RowNumStart = 1) 
  AND (RowNumEnd = 1)
ORDER BY ID, StartDate;

结果:

enter image description here

也许可以使用Itzik Ben-Gan的间隔打包解决方案来解决此问题,这是一种更优雅的方法,我一想出来就将其发布。

已添加

此外,我计算所有预订区的预订数量,并按学生(ID)建立总和,以最终做出“居民”决定。在第一次CTE(比较)中,预订仅限于最近3个月:

WITH
  comparison (ID, StartDate, EndDate, TotalDays, CountBookings, SumSingleDays) AS (
    SELECT bStart.ID, bStart.StartDate, bEnd.EndDate, Workdays.Total
      , COUNT(Workdays.Single) OVER (
          PARTITION BY bStart.ID, bStart.StartDate 
          ORDER BY bEnd.StartDate
          ROWS UNBOUNDED PRECEDING)
      , SUM(Workdays.Single) OVER (
          PARTITION BY bStart.ID, bStart.StartDate 
          ORDER BY bEnd.StartDate
          ROWS UNBOUNDED PRECEDING)
    FROM StudentBookings bStart
      INNER JOIN StudentBookings bEnd 
        ON bStart.ID = bEnd.ID AND bStart.StartDate <= bEnd.StartDate
      CROSS APPLY (VALUES (
        DATEDIFF(day, 0, bStart.StartDate), 
        DATEDIFF(day, 0, bEnd.StartDate), 
        1+DATEDIFF(day, 0, bEnd.EndDate))
      ) d (s1, s2, e2)
      CROSS APPLY (VALUES (
        (d.e2 - d.s1) - (d.e2/7 - d.s1/7) - ((d.e2+1)/7 - (d.s1+1)/7),
        (d.e2 - d.s2) - (d.e2/7 - d.s2/7) - ((d.e2+1)/7 - (d.s2+1)/7))
      ) Workdays (Total, Single)
    WHERE bStart.StartDate >= DATEADD(month, -3, GETDATE())
  ),
  sorting (ID, StartDate, EndDate, Duration, CountBookings, RowNumStart, RowNumEnd) AS (
    SELECT ID, StartDate, EndDate, TotalDays, CountBookings
      , ROW_NUMBER() OVER (PARTITION BY ID, EndDate ORDER BY StartDate)
      , ROW_NUMBER() OVER (PARTITION BY ID, StartDate ORDER BY EndDate DESC)
    FROM comparison
    WHERE TotalDays = SumSingleDays
  ),
 counting (ID, StartDate, EndDate, Duration, Bookings) AS (
  SELECT ID, StartDate, EndDate, Duration
    , SUM(CountBookings) OVER (PARTITION BY ID)
  FROM sorting WHERE (RowNumStart = 1) AND (RowNumEnd = 1)
)
SELECT ID, StartDate, EndDate, Duration, Bookings
  , CASE 
      WHEN Duration >= 5 OR Bookings >= 2 THEN 'Resident' ELSE 'Visitor'
    END AS [Type]
FROM counting
ORDER BY ID, StartDate;

过滤类参考:

将从bStart表引用中获取并过滤ClassReference。为了能够将此字段添加到最终查询中,还必须使用它来连接bEnd表引用,因此只有具有相同ClassReference值的预订间隔才会连接到块:

WITH
  comparison (ID, ClassReference, StartDate, EndDate, TotalDays, CountBookings, SumSingleDays) AS (
    SELECT bStart.ID, bStart.ClassReference, bStart.StartDate, bEnd.EndDate, Workdays.Total
      , COUNT(Workdays.Single) OVER (
          PARTITION BY bStart.ID, bStart.StartDate 
          ORDER BY bEnd.StartDate
          ROWS UNBOUNDED PRECEDING)
      , SUM(Workdays.Single) OVER (
          PARTITION BY bStart.ID, bStart.StartDate 
          ORDER BY bEnd.StartDate
          ROWS UNBOUNDED PRECEDING)
    FROM StudentBookings bStart
      INNER JOIN StudentBookings bEnd 
        ON bStart.ID = bEnd.ID AND bStart.StartDate <= bEnd.StartDate
       AND bStart.ClassReference = bEnd.ClassReference
      CROSS APPLY (VALUES (
        DATEDIFF(day, 0, bStart.StartDate), 
        DATEDIFF(day, 0, bEnd.StartDate), 
        1+DATEDIFF(day, 0, bEnd.EndDate))
      ) d (s1, s2, e2)
      CROSS APPLY (VALUES (
        (d.e2 - d.s1) - (d.e2/7 - d.s1/7) - ((d.e2+1)/7 - (d.s1+1)/7),
        (d.e2 - d.s2) - (d.e2/7 - d.s2/7) - ((d.e2+1)/7 - (d.s2+1)/7))
      ) Workdays (Total, Single)
    WHERE bStart.StartDate >= DATEADD(month, -3, GETDATE())
      AND bStart.ClassReference IN (N'C1', N'C2')
  ),
  sorting (ID, ClassReference, StartDate, EndDate, Duration, CountBookings, RowNumStart, RowNumEnd) AS (
    SELECT ID, ClassReference, StartDate, EndDate, TotalDays, CountBookings
      , ROW_NUMBER() OVER (PARTITION BY ID, ClassReference, EndDate ORDER BY StartDate)
      , ROW_NUMBER() OVER (PARTITION BY ID, ClassReference, StartDate ORDER BY EndDate DESC)
    FROM comparison
    WHERE TotalDays = SumSingleDays
  ),
  counting (ID, ClassReference, StartDate, EndDate, Duration, Bookings) AS (
    SELECT ID, ClassReference, StartDate, EndDate, Duration
      , SUM(CountBookings) OVER (PARTITION BY ID, ClassReference)
    FROM sorting WHERE (RowNumStart = 1) AND (RowNumEnd = 1)
  )
SELECT ID, ClassReference, StartDate, EndDate, Duration, Bookings
  , CASE 
      WHEN Duration >= 5 OR Bookings >= 2 THEN 'Resident' ELSE 'Visitor'
    END AS [Type]
FROM counting
ORDER BY ID, StartDate;

使用此数据进行测试:

Test Data

使用最近12个月的过滤器,查询将返回:

Result with ClassReference

因此,学生1在C2类中是“常驻学生”,但在C1类中是“访客”。