如何识别重叠的行

时间:2020-09-08 19:14:14

标签: sql date amazon-redshift

我需要识别重叠的行。下表是表格,我需要重叠的列:

| identifier    | status    | startDate     | endDate       | pID   | OVERLAPPING   |
|------------   |---------- |------------   |------------   |-----  |-------------  |
| A             | Approved  | 2020-10-01    | 2020-10-07    | x1    | Yes           |
| A             | Approved  | 2020-10-01    | 2020-10-07    | x2    | No            |
| A             | Approved  | 2020-10-01    | 2020-10-07    | x3    | Yes           |
| A             | Approved  | 2020-10-01    | 2020-10-07    | x4    | No            |
| B             | Approved  | 2020-10-10    | 2020-10-12    | x2    | No            |
| B             | Approved  | 2020-10-10    | 2020-10-12    | x5    | No            |
| C             | Rejected  | 2020-10-05    | 2020-10-06    | x3    | No            |
| C             | Rejected  | 2020-10-05    | 2020-10-06    | x7    | No            |
| C             | Rejected  | 2020-10-05    | 2020-10-06    | x8    | No            |
| C             | Rejected  | 2020-10-05    | 2020-10-06    | x9    | No            |
| D             | Approved  | 2020-10-01    | 2020-10-07    | x5    | No            |
| D             | Approved  | 2020-10-01    | 2020-10-07    | x1    | Yes           |
| E             | Approved  | 2020-10-03    | 2020-10-04    | x3    | Yes           |
  1. 由于C处于拒绝状态,因此带有C标识符的任何内容都不会被计入查找重叠的行。因此,即使C语言中艰难的x3与A语言中的x3重叠,也不会被视为重叠。
  2. x1在A和D中重叠。因此,两行都将被视为重叠。
  3. A中的x3与E中的x3重叠,因为E的开始日期和结束日期在A时段内。
  4. B和D中的x5不重叠,因为B和D的日期都不重叠。

如果开始日期和结束日期相同,我可以通过创建一个将开始日期,结束日期和pID组合在一起的列,然后对该字段进行所有行计数来获取此信息。如果大于1,则表示重叠。但是,这并不涵盖x3的情况,其中开始日期和日期不相同,但在一定时间内仍然重叠。

3 个答案:

答案 0 :(得分:2)

像这样吗?

(我使用相关子查询来避免联接在一行重叠多行时返回多行。)

SELECT
  *,
  (
    SELECT 1
      FROM yourTable AS lookup
     WHERE lookup.identifier <> yourTable.identifier  -- Don't check overlaps with itself
       AND lookup.pID         = yourTable.pID
       AND lookup.startDate  <= yourTable.endDate
       AND lookup.endDate    >= yourTable.startDate
     LIMIT 1
  )
FROM
  yourTable

请注意>=<=的使用,具体取决于您使用的是包容性(我希望不是)还是互斥量(我希望是)endDate

答案 1 :(得分:0)

根据您的叙述,这似乎符合您的逻辑:

select *,
   case 
     when status = 'Rejected' then 'No'
       -- previous row overlaps
     when startDate < -- maybe <=
          max(case when status <> 'Rejected' then endDate end)
          over (partition by pID
                order by startDate, endDate desc
                rows between unbounded preceding and 1 preceding)
       -- next row overlaps
       or endDate > -- maybe >=
          min(case when status <> 'Rejected' then startDate end)
          over (partition by pID
                order by startDate, endDate desc
                rows between 1 following and unbounded following)
     then 'Yes'
     else 'No'
   end 
from tab

答案 2 :(得分:0)

尝试一下(尽管我使用的布尔值是true和false,但是...

WITH
input(identifier,status,startDate,endDate,pID,OVERLAPPING) AS (
          SELECT 'A','Approved',DATE '2020-10-01',DATE '2020-10-07','x1','Yes'
UNION ALL SELECT 'A','Approved',DATE '2020-10-01',DATE '2020-10-07','x2','No'
UNION ALL SELECT 'A','Approved',DATE '2020-10-01',DATE '2020-10-07','x3','Yes'
UNION ALL SELECT 'A','Approved',DATE '2020-10-01',DATE '2020-10-07','x4','No'
UNION ALL SELECT 'B','Approved',DATE '2020-10-10',DATE '2020-10-12','x2','No'
UNION ALL SELECT 'B','Approved',DATE '2020-10-10',DATE '2020-10-12','x5','No'
UNION ALL SELECT 'C','Rejected',DATE '2020-10-05',DATE '2020-10-06','x3','No'
UNION ALL SELECT 'C','Rejected',DATE '2020-10-05',DATE '2020-10-06','x7','No'
UNION ALL SELECT 'C','Rejected',DATE '2020-10-05',DATE '2020-10-06','x8','No'
UNION ALL SELECT 'C','Rejected',DATE '2020-10-05',DATE '2020-10-06','x9','No'
UNION ALL SELECT 'D','Approved',DATE '2020-10-01',DATE '2020-10-07','x5','No'
UNION ALL SELECT 'D','Approved',DATE '2020-10-01',DATE '2020-10-07','x1','Yes'
UNION ALL SELECT 'E','Approved',DATE '2020-10-03',DATE '2020-10-04','x3','Yes'
)
SELECT 
  *
, status = 'Approved'
  AND (
     COALESCE(LAG(enddate)    OVER(w) ,'0001-01-01')> startdate
  OR COALESCE(LEAD(startdate) OVER(w) ,'9999-12-31')< enddate
  ) AS overlap
FROM input
WINDOW w AS (PARTITION BY pid ORDER BY startdate)
ORDER BY
  identifier
, startdate
;
-- out identifier|status  |startDate |endDate   |pID|OVERLAPPING|overlap
-- out A         |Approved|2020-10-01|2020-10-07|x3 |Yes        |true
-- out A         |Approved|2020-10-01|2020-10-07|x2 |No         |false
-- out A         |Approved|2020-10-01|2020-10-07|x4 |No         |false
-- out A         |Approved|2020-10-01|2020-10-07|x1 |Yes        |true
-- out B         |Approved|2020-10-10|2020-10-12|x5 |No         |false
-- out B         |Approved|2020-10-10|2020-10-12|x2 |No         |false
-- out C         |Rejected|2020-10-05|2020-10-06|x3 |No         |false
-- out C         |Rejected|2020-10-05|2020-10-06|x8 |No         |false
-- out C         |Rejected|2020-10-05|2020-10-06|x7 |No         |false
-- out C         |Rejected|2020-10-05|2020-10-06|x9 |No         |false
-- out D         |Approved|2020-10-01|2020-10-07|x1 |Yes        |true
-- out D         |Approved|2020-10-01|2020-10-07|x5 |No         |false
-- out E         |Approved|2020-10-03|2020-10-04|x3 |Yes        |true