选择日期范围不重叠的日期范围

时间:2020-01-07 04:11:58

标签: sql sql-server date

我有两个表,每个表包含几个期间的开始和结束日期。我希望找到一种有效的方法来查找日期在第一个表的范围内但不在第二个表的范围内的期间(日期范围)。

例如,如果这是我的第一张桌子(带有我想要的日期)

start_date  end_date
2001-01-01  2010-01-01
2012-01-01  2015-01-01

这是我的第二张桌子(带有我不想要的日期)

start_date  end_date
2002-01-01  2006-01-01
2003-01-01  2004-01-01
2005-01-01  2009-01-01
2014-01-01  2018-01-01

然后输出看起来像

start_date  end_date
2001-01-01  2001-12-31
2009-01-02  2010-01-01
2012-01-01  2013-12-31

我们可以安全地假设第一个表中的时间段是不重叠的,但是不能假设第二个表中的时间段是重叠的。

我已经有一种方法可以做到这一点,但是比我接受的要慢一个数量级。因此希望有人可以提出更快的方法。

我目前的方法如下:

  1. 将表2合并到非重叠期间
  2. 找到表2的倒数
  3. 加入表1和倒排表2中的重叠时间段

我敢肯定,如果其中某些步骤可以合并在一起,将会有一种更快的方法。

更详细

/* (1) merge overlapping preiods */
WITH
spell_starts AS (
    SELECT [start_date], [end_date]
    FROM table_2 s1
    WHERE NOT EXISTS (
        SELECT 1
        FROM table_2 s2
        WHERE s2.[start_date] < s1.[start_date] 
        AND s1.[start_date] <= s2.[end_date]
    )
),
spell_ends AS (
    SELECT [start_date], [end_date]
    FROM table_2 t1
    WHERE NOT EXISTS (
        SELECT 1 
        FROM table_2 t2
        WHERE t2.[start_date] <= t1.[end_date] 
        AND t1.[end_date] < t2.[end_date]
    )
)
SELECT s.[start_date], MIN(e.[end_date]) as [end_date]
FROM spell_starts s
INNER JOIN spell_ends e
ON s.[start_date] <= e.[end_date]
GROUP BY s.[start_date]

/* (2) inverse table 2 */
SELECT [start_date], [end_date]
FROM (
    /* all forward looking spells */
    SELECT DATEADD(DAY, 1, [end_date]) AS [start_date]
          ,LEAD(DATEADD(DAY, -1, [start_date]), 1, '9999-01-01') OVER ( ORDER BY [start_date] ) AS [end_date]
    FROM merge_table_2

    UNION ALL

    /* back looking spell (to 'origin of time') created separately */
    SELECT '1900-01-01' AS [start_date]
          ,DATEADD(DAY, -1, MIN([start_date])) AS [end_date]
    FROM merge_table_2
) k
WHERE [start_date] <= [end_date]
AND '1900-01-01' <= [start_date] 
AND [end_date] <= '9999-01-01'

/* (3) overlap spells */
SELECT IIF(t1.start_date < t2.start_date, t2.start_date, t1.start_date) AS start_date
      ,IIF(t1.end_date < t2.end_date, t1.end_date, t2.end_date) AS end_date
FROM table_1 t1
INNER JOIN inverse_merge_table_2 t2
ON t1.start_date < t2.end_date
AND t2.start_date < t1.end_date

3 个答案:

答案 0 :(得分:3)

希望这会有所帮助。我对我用于解释目的的两个ctes进行了评论 在这里,您去了:

drop table table1

select cast('2001-01-01' as date) as start_date, cast('2010-01-01' as date) as end_date into table1
union select '2012-01-01',  '2015-01-01' 

drop table table2

select cast('2002-01-01' as date) as start_date, cast('2006-01-01' as date) as end_date into table2
union select '2003-01-01',  '2004-01-01'
union select '2005-01-01',  '2009-01-01'
union select '2014-01-01',  '2018-01-01'

/ *****解决方案***** /

-- This cte put all dates into one column
with cte as
(
    select t
    from
    (
        select start_date as t
        from table1
        union all
        select end_date
        from table1

        union all

        select dateadd(day,-1,start_date) -- for table 2 we bring the start date back one day to make sure we have nothing in the forbidden range
        from table2
        union all
        select  dateadd(day,1,end_date) -- for table 2 we bring the end date forward one day to make sure we have nothing in the forbidden range
        from table2
    )a
),
-- This one adds an end date using the lead function
cte2 as (select t as s, coalesce(LEAD(t,1) OVER ( ORDER BY t ),t) as e from cte a)
-- this query gets all intervals not in table2 but in table1
select s, e
from cte2 a 
where not exists(select 1 from table2 b where s between dateadd(day,-1,start_date) and dateadd(day,1,end_date) and e between dateadd(day,-1,start_date) and dateadd(day,1,end_date) )
and exists(select 1 from table1 b where s between start_date and end_date and e between start_date and end_date)
and s <> e

答案 1 :(得分:2)

如果要性能,则要使用窗口功能。

想法是:

  • 用两个表的内外标记组合日期。
  • 使用累积总和来确定日期从哪里开始输入和输入。
  • 然后您会遇到一个空白和孤岛的问题,您想在其中合并结果。
  • 最后,过滤所需的特定时段。

这看起来像:

with dates as (
      select start_date as dte, 1 as in1, 0 as in2
      from table1
      union all
      select dateadd(day, 1, end_date), -1, 0
      from table1
      union all
      select start_date, 0, 1 as in2
      from table2
      union all
      select dateadd(day, 1, end_date), 0, -1
      from table2
     ),
     d as (
      select dte,
             sum(sum(in1)) over (order by dte) as ins_1,
             sum(sum(in2)) over (order by dte) as ins_2
      from dates
      group by dte
     )
select min(dte), max(next_dte)
from (select d.*, dateadd(day, -1, lead(dte) over (order by dte)) as next_dte, 
             row_number() over (order by dte) as seqnum,
             row_number() over (partition by case when ins_1 >= 1 and ins_2 = 0 then 'in' else 'out' end order by dte) as seqnum_2
      from d
     ) d
group by (seqnum - seqnum_2)
having max(ins_1) > 0 and max(ins_2) = 0
order by min(dte);

Here是db <>小提琴。

答案 2 :(得分:0)

感谢@zip和@Gordon的回答。两者都优于我最初的方法。但是,在我的环境和上下文中,以下解决方案比两种方法都快:

WITH acceptable_starts AS (
    SELECT [start_date] FROM table1 AS a
    WHERE NOT EXISTS (
        SELECT 1 FROM table2 AS b
        WHERE DATEADD(DAY, 1, a.[end_date]) BETWEEN b.[start_date] AND b.
    UNION ALL
    SELECT DATEADD(DAY, 1, [end_date]) FROM table2 AS a
    WHERE NOT EXISTS (
        SELECT 1 FROM table2 AS b
        WHERE DATEADD(DAY, 1, a.[end_date]) BETWEEN b.[start_date] AND b.[end_date]
    )
),
acceptable_ends AS (
    SELECT [end_date] FROM table1 AS a
    WHERE NOT EXISTS (
        SELECT 1 FROM table2 AS b
        WHERE DATEADD(DAY, -1, a.[start_date]) BETWEEN b.[start_date] AND b.[end_date]
    )
    UNION ALL
    SELECT DATEADD(DAY, -1, [start_date]) FROM table2 AS a
    WHERE NOT EXISTS (
        SELECT 1 FROM table2 AS b
        WHERE DATEADD(DAY, -1, a.[start_date]) BETWEEN b.[start_date] AND b.[end_date]
    )
)
SELECT s.[start_date], MIN(e.[end_date]) AS [end_date]
FROM acceptable_starts
INNER JOIN acceptable_ends
ON s.[start_date] < e.[end_date]