我有两个表,每个表包含几个期间的开始和结束日期。我希望找到一种有效的方法来查找日期在第一个表的范围内但不在第二个表的范围内的期间(日期范围)。
例如,如果这是我的第一张桌子(带有我想要的日期)
start_date end_date
2001-01-01 2010-01-01
2012-01-01 2015-01-01
这是我的第二张桌子(带有我不想要的日期)
start_date end_date
2002-01-01 2006-01-01
2003-01-01 2004-01-01
2005-01-01 2009-01-01
2014-01-01 2018-01-01
然后输出看起来像
start_date end_date
2001-01-01 2001-12-31
2009-01-02 2010-01-01
2012-01-01 2013-12-31
我们可以安全地假设第一个表中的时间段是不重叠的,但是不能假设第二个表中的时间段是重叠的。
我已经有一种方法可以做到这一点,但是比我接受的要慢一个数量级。因此希望有人可以提出更快的方法。
我目前的方法如下:
我敢肯定,如果其中某些步骤可以合并在一起,将会有一种更快的方法。
更详细
/* (1) merge overlapping preiods */
WITH
spell_starts AS (
SELECT [start_date], [end_date]
FROM table_2 s1
WHERE NOT EXISTS (
SELECT 1
FROM table_2 s2
WHERE s2.[start_date] < s1.[start_date]
AND s1.[start_date] <= s2.[end_date]
)
),
spell_ends AS (
SELECT [start_date], [end_date]
FROM table_2 t1
WHERE NOT EXISTS (
SELECT 1
FROM table_2 t2
WHERE t2.[start_date] <= t1.[end_date]
AND t1.[end_date] < t2.[end_date]
)
)
SELECT s.[start_date], MIN(e.[end_date]) as [end_date]
FROM spell_starts s
INNER JOIN spell_ends e
ON s.[start_date] <= e.[end_date]
GROUP BY s.[start_date]
/* (2) inverse table 2 */
SELECT [start_date], [end_date]
FROM (
/* all forward looking spells */
SELECT DATEADD(DAY, 1, [end_date]) AS [start_date]
,LEAD(DATEADD(DAY, -1, [start_date]), 1, '9999-01-01') OVER ( ORDER BY [start_date] ) AS [end_date]
FROM merge_table_2
UNION ALL
/* back looking spell (to 'origin of time') created separately */
SELECT '1900-01-01' AS [start_date]
,DATEADD(DAY, -1, MIN([start_date])) AS [end_date]
FROM merge_table_2
) k
WHERE [start_date] <= [end_date]
AND '1900-01-01' <= [start_date]
AND [end_date] <= '9999-01-01'
/* (3) overlap spells */
SELECT IIF(t1.start_date < t2.start_date, t2.start_date, t1.start_date) AS start_date
,IIF(t1.end_date < t2.end_date, t1.end_date, t2.end_date) AS end_date
FROM table_1 t1
INNER JOIN inverse_merge_table_2 t2
ON t1.start_date < t2.end_date
AND t2.start_date < t1.end_date
答案 0 :(得分:3)
希望这会有所帮助。我对我用于解释目的的两个ctes进行了评论 在这里,您去了:
drop table table1
select cast('2001-01-01' as date) as start_date, cast('2010-01-01' as date) as end_date into table1
union select '2012-01-01', '2015-01-01'
drop table table2
select cast('2002-01-01' as date) as start_date, cast('2006-01-01' as date) as end_date into table2
union select '2003-01-01', '2004-01-01'
union select '2005-01-01', '2009-01-01'
union select '2014-01-01', '2018-01-01'
/ *****解决方案***** /
-- This cte put all dates into one column
with cte as
(
select t
from
(
select start_date as t
from table1
union all
select end_date
from table1
union all
select dateadd(day,-1,start_date) -- for table 2 we bring the start date back one day to make sure we have nothing in the forbidden range
from table2
union all
select dateadd(day,1,end_date) -- for table 2 we bring the end date forward one day to make sure we have nothing in the forbidden range
from table2
)a
),
-- This one adds an end date using the lead function
cte2 as (select t as s, coalesce(LEAD(t,1) OVER ( ORDER BY t ),t) as e from cte a)
-- this query gets all intervals not in table2 but in table1
select s, e
from cte2 a
where not exists(select 1 from table2 b where s between dateadd(day,-1,start_date) and dateadd(day,1,end_date) and e between dateadd(day,-1,start_date) and dateadd(day,1,end_date) )
and exists(select 1 from table1 b where s between start_date and end_date and e between start_date and end_date)
and s <> e
答案 1 :(得分:2)
如果要性能,则要使用窗口功能。
想法是:
这看起来像:
with dates as (
select start_date as dte, 1 as in1, 0 as in2
from table1
union all
select dateadd(day, 1, end_date), -1, 0
from table1
union all
select start_date, 0, 1 as in2
from table2
union all
select dateadd(day, 1, end_date), 0, -1
from table2
),
d as (
select dte,
sum(sum(in1)) over (order by dte) as ins_1,
sum(sum(in2)) over (order by dte) as ins_2
from dates
group by dte
)
select min(dte), max(next_dte)
from (select d.*, dateadd(day, -1, lead(dte) over (order by dte)) as next_dte,
row_number() over (order by dte) as seqnum,
row_number() over (partition by case when ins_1 >= 1 and ins_2 = 0 then 'in' else 'out' end order by dte) as seqnum_2
from d
) d
group by (seqnum - seqnum_2)
having max(ins_1) > 0 and max(ins_2) = 0
order by min(dte);
Here是db <>小提琴。
答案 2 :(得分:0)
感谢@zip和@Gordon的回答。两者都优于我最初的方法。但是,在我的环境和上下文中,以下解决方案比两种方法都快:
WITH acceptable_starts AS (
SELECT [start_date] FROM table1 AS a
WHERE NOT EXISTS (
SELECT 1 FROM table2 AS b
WHERE DATEADD(DAY, 1, a.[end_date]) BETWEEN b.[start_date] AND b.
UNION ALL
SELECT DATEADD(DAY, 1, [end_date]) FROM table2 AS a
WHERE NOT EXISTS (
SELECT 1 FROM table2 AS b
WHERE DATEADD(DAY, 1, a.[end_date]) BETWEEN b.[start_date] AND b.[end_date]
)
),
acceptable_ends AS (
SELECT [end_date] FROM table1 AS a
WHERE NOT EXISTS (
SELECT 1 FROM table2 AS b
WHERE DATEADD(DAY, -1, a.[start_date]) BETWEEN b.[start_date] AND b.[end_date]
)
UNION ALL
SELECT DATEADD(DAY, -1, [start_date]) FROM table2 AS a
WHERE NOT EXISTS (
SELECT 1 FROM table2 AS b
WHERE DATEADD(DAY, -1, a.[start_date]) BETWEEN b.[start_date] AND b.[end_date]
)
)
SELECT s.[start_date], MIN(e.[end_date]) AS [end_date]
FROM acceptable_starts
INNER JOIN acceptable_ends
ON s.[start_date] < e.[end_date]