重叠日期排除

时间:2018-04-23 15:34:19

标签: sql sql-server

这可能是一个相当简单的问题,但对我来说仍然是一个挑战。我有一个包含4个字段Person_id,Store_id,startdate和enddate的表。对于person_id的特定值,可以有许多具有不同开始和结束日期的记录。如果超过当前结束日期的24小时到下一个开始日期,我需要找到时间间隔。这里的问题是每个人都可能有重叠期间的记录。

例1:

Person_id       Store_ID          Startdate           enddate
10000351067      10000232561      2010-04-08          2010-08-06
10000351067      10000232561      2016-09-09          2016-09-16
10000351067      10000232561      2016-09-16          2016-10-03
10000351067      10000232561      2016-10-03          2016-10-07
10000351067      10000232561      2016-10-07          2017-01-17
10000351067      10000232561      2017-01-17          2018-04-05
**10000351067      10000232561      2017-06-16          2017-06-20**
10000351067      10000232561      2018-04-05          NULL

例2:

10000193858      10000225875      2016-07-13          2016-08-03
10000193858      10000225875      2016-08-03          2017-05-17
10000193858      10000225875      2017-05-17          2017-06-05
10000193858      10000225875      2017-05-31          2017-06-05
10000193858      10000225875      2017-06-05          2017-06-13
10000193858      10000225875      2017-06-13          2017-08-16
10000193858      10000225875      2017-08-07          2017-08-16
10000193858      10000225875      2017-08-16          2017-08-18
10000193858      10000225875      2017-08-18          2017-08-31
10000193858      10000225875      2017-08-31          2018-01-05
**10000193858      10000225875      2017-11-13          2017-11-20**
10000193858      10000225875      2018-01-05          NULL

在所有情况下都需要保留具有最早“startdate”的记录。在存在多个具有相同最旧“startdate”的记录的情况下,需要保留具有最大“enddate”的记录。我尝试使用下面的查询,但没有成功(可能我做错了什么)。

CREATE TABLE #ordered_removal_list(
                [ID_New] [int] IDENTITY(1,1) NOT NULL,
                [person_id] [bigint] NULL,
                [Store_ID] [bigint] NULL,
                [started_at] [datetime] NULL,
                [ended_at] [datetime] NULL,     
)
INSERT INTO #ordered_removal_list 
(person_id,Store_ID,started_at,ended_at)
SELECTperson_id,Store_ID,started_at,ended_at into #test FROM Temp_Data
;WITH cte 
AS
(
SELECT ord1.person_id, ord1.started_at, ord1.ended_at, next1.started_at as next1_start,
Last1.started_at as last1_start, CASE WHEN DATEDIFF (HOUR, last1.ended_at, ord1.started_at) > 23 THEN 'GAP' ELSE 'NO_GAP' END as 'gap'
FROM #test ord1
LEFT JOIN #test next1 on next1.[ID_New] = ord1.[ID_New] + 1 and ord1.person_id = next1.person_id)
SELECT * FROM cte 
where gap = 'GAP'

我无法排除上面示例中以红色标记的重叠日期。任何建议(最好是代码示例)

结果集:

示例1)如果我排除重叠日期10000351067,10000232561,2017-06-16,2017-06-20那么它们是当前记录结束到下一个开始日期之间的时间差,即整个时间段之间没有时间差与下一个时期相比。

示例2)如果我排除重叠日期10000193858,10000225875,2017-11-13,2017-11-20那么它们是当前记录结束日期到下一个开始日期之间的时间差距,即整个时间段之间没有时间差距与下一个时期相比。

谢谢!

3 个答案:

答案 0 :(得分:0)

在这里(PostgreSQL中的示例):

with attendance_ext as (
select a.*, (select min(startdate) from attendance x 
  where x.startdate > a.startdate) as nextstart
  from attendance a)
select * from attendance_ext
  where (startdate + interval '24 hours') < nextstart;

假设你的桌子就像:

create table attendance (
  person_id int, 
  store_id int, 
  startdate timestamp, 
  enddate timestamp
);

insert into attendance (person_id, store_id, startdate, enddate) 
  values (1, 1, '2010-04-08', '2010-08-06');
insert into attendance (person_id, store_id, startdate, enddate) 
  values (1, 1, '2016-09-09', '2016-09-16');
insert into attendance (person_id, store_id, startdate, enddate) 
  values (1, 1, '2016-09-16', '2016-10-03');
insert into attendance (person_id, store_id, startdate, enddate) 
  values (1, 1, '2016-10-03', '2016-10-07');

答案 1 :(得分:0)

您似乎正在尝试排除其他行覆盖的行。如果是,请尝试此查询

{votes[0].cast_code==votes[1].cast_code}

答案 2 :(得分:0)

这是一个似乎适用于提供的数据的解决方案。它通过检查没有重叠记录(p)取代当前(c)记录来工作。如果当前记录在当前记录之前开始,则重叠记录将取代当前记录,或者在当前记录的同时开始但在当前记录之后结束,在开始日期或结束日期列中的NULL分别被视为时间的开始或结束。

要确定记录优先级,请添加行号列(RN),这也用于防止将记录与自身进行比较。比较记录时,前面的记录的行号小于当前记录。

一旦完成,剩下的就是检查重叠请参阅此SQL Fiddle以获取示例(请注意我添加了一个带有并发启动时间的示例记录到第二个数据集以测试该条件):

with dta as (
  select row_number()
    over (partition by person_id, store_id
          order by case when startdate is null then 1 else 0 end
                 , startdate
                 , case when enddate is null then 1 else 2 end
                 , enddate desc) rn
       , a.*
    from YourData a
 )
 select * from dta c
 where not exists (
     select 1 from dta p
      where p.person_id = c.person_id
        and p.store_id = c.store_id
        -- Establish precedence
        and p.rn < c.rn
        -- Detect overlaps
        and (p.startdate is null or p.startdate < c.enddate or c.enddate is null)
        and (c.startdate is null or c.startdate < p.enddate or p.enddate is null)
   )

order by Person_id, store_id, startdate

<强> Results

| rn |   Person_id |    Store_ID |            Startdate |              enddate |
|----|-------------|-------------|----------------------|----------------------|
|  1 | 10000193858 | 10000225875 | 2016-07-13T00:00:00Z | 2016-08-03T00:00:00Z |
|  2 | 10000193858 | 10000225875 | 2016-08-03T00:00:00Z | 2017-05-17T00:00:00Z |
|  3 | 10000193858 | 10000225875 | 2017-05-17T00:00:00Z | 2017-06-05T00:00:00Z |
|  5 | 10000193858 | 10000225875 | 2017-06-05T00:00:00Z | 2017-06-13T00:00:00Z |
|  6 | 10000193858 | 10000225875 | 2017-06-13T00:00:00Z | 2017-08-16T00:00:00Z |
|  8 | 10000193858 | 10000225875 | 2017-08-16T00:00:00Z | 2017-08-18T00:00:00Z |
|  9 | 10000193858 | 10000225875 | 2017-08-18T00:00:00Z | 2017-08-31T00:00:00Z |
| 11 | 10000193858 | 10000225875 | 2017-08-31T00:00:00Z | 2018-01-05T00:00:00Z |
| 13 | 10000193858 | 10000225875 | 2018-01-05T00:00:00Z |               (null) |
|  1 | 10000351067 | 10000232561 | 2010-04-08T00:00:00Z | 2010-08-06T00:00:00Z |
|  2 | 10000351067 | 10000232561 | 2016-09-09T00:00:00Z | 2016-09-16T00:00:00Z |
|  3 | 10000351067 | 10000232561 | 2016-09-16T00:00:00Z | 2016-10-03T00:00:00Z |
|  4 | 10000351067 | 10000232561 | 2016-10-03T00:00:00Z | 2016-10-07T00:00:00Z |
|  5 | 10000351067 | 10000232561 | 2016-10-07T00:00:00Z | 2017-01-17T00:00:00Z |
|  6 | 10000351067 | 10000232561 | 2017-01-17T00:00:00Z | 2018-04-05T00:00:00Z |
|  8 | 10000351067 | 10000232561 | 2018-04-05T00:00:00Z |               (null) |