(自我)按时间间隔加入

时间:2015-08-01 20:27:40

标签: sql oracle performance query-optimization self-join

我在oracle数据库中有一个表。架构是

create table PERIODS
( 
  ID NUMBER, 
  STARTTIME TIMESTAMP, 
  ENDTIME TIMESTAMP, 
  TYPE VARCHAR2(100)
)

我有两个不同的TYPE'sTYPEATYPEB。具有独立的开始和结束时间,它们可以重叠。我想要找到的TYPEB期开始,在TYPEA的给定时期内完全包含或结束。

这是我到目前为止提出的(带有一些样本数据)

WITH mydata 
     AS (SELECT 100                                                    ID, 
                To_timestamp('2015-08-01 11:00', 'YYYY-MM-DD HH24:MI') STARTTIME, 
                To_timestamp('2015-08-01 11:20', 'YYYY-MM-DD HH24:MI') ENDTIME, 
                'TYPEA'                                                TYPE 
         FROM   dual 
         UNION ALL 
         SELECT 110                                                    ID, 
                To_timestamp('2015-08-01 11:30', 'YYYY-MM-DD HH24:MI') STARTTIME, 
                To_timestamp('2015-08-01 11:50', 'YYYY-MM-DD HH24:MI') ENDTIME, 
                'TYPEA'                                                TYPE 
         FROM   dual 
         UNION ALL 
         SELECT 120                                                    ID, 
                To_timestamp('2015-08-01 12:00', 'YYYY-MM-DD HH24:MI') STARTTIME, 
                To_timestamp('2015-08-01 12:20', 'YYYY-MM-DD HH24:MI') ENDTIME, 
                'TYPEA'                                                TYPE 
         FROM   dual 
         UNION ALL 
         SELECT 105                                                    ID, 
                To_timestamp('2015-08-01 10:55', 'YYYY-MM-DD HH24:MI') STARTTIME, 
                To_timestamp('2015-08-01 11:05', 'YYYY-MM-DD HH24:MI') ENDTIME, 
                'TYPEB'                                                TYPE 
         FROM   dual 
         UNION ALL 
         SELECT 108                                                    ID, 
                To_timestamp('2015-08-01 11:05', 'YYYY-MM-DD HH24:MI') STARTTIME, 
                To_timestamp('2015-08-01 11:15', 'YYYY-MM-DD HH24:MI') ENDTIME, 
                'TYPEB'                                                TYPE 
         FROM   dual 
         UNION ALL 
         SELECT 111                                                    ID, 
                To_timestamp('2015-08-01 11:15', 'YYYY-MM-DD HH24:MI') STARTTIME, 
                To_timestamp('2015-08-01 12:25', 'YYYY-MM-DD HH24:MI') ENDTIME, 
                'TYPEB'                                                TYPE 
         FROM   dual), 
     typeas 
     AS (SELECT starttime, 
                endtime 
         FROM   mydata 
         WHERE  TYPE = 'TYPEA'), 
     typebs 
     AS (SELECT id, 
                starttime, 
                endtime 
         FROM   mydata 
         WHERE  TYPE = 'TYPEB') 
SELECT id 
FROM   typebs b 
       join typeas a 
         ON ( b.starttime BETWEEN a.starttime AND a.endtime ) 
             OR ( b.starttime BETWEEN a.starttime AND a.endtime 
                  AND b.endtime BETWEEN a.starttime AND a.endtime ) 
             OR ( b.endtime BETWEEN a.starttime AND a.endtime ) 
ORDER  BY id; 

这似乎原则上有效,上面查询的结果是

        ID
----------
       105
       108
       111

因此它选择在第一个TYPEB期间内开始或结束的三个句点TYPEA

问题是该表有大约200k个条目,并且已经达到这个大小,上面的查询非常慢 - 这对我来说是非常令人惊讶的TYPEATYPEB条目的数量相当低(1-2k)

是否有更有效的方式来执行此类自连接?我的查询中是否遗漏了其他内容?

1 个答案:

答案 0 :(得分:1)

也许值得一试(你也需要在oracle中写下最具限制性的条件,不要问我为什么或相信我,最好自己进行性能测试):

SELECT
   p.id
FROM
   periods p
WHERE
   EXISTS(SELECT * FROM periods q WHERE
      (p.startTime BETWEEN q.startTime AND q.endTime
      OR p.endTime BETWEEN q.startTime AND q.endTime
      OR p.startTime < q.startTime AND p.endTime > q.endTime -- overlapping correction, remove if not needed
      ) AND q.type = 'TYPEA'
   ) AND p.type = 'TYPEB'
ORDER BY
   p.id
;