SQL查询以在一系列重叠(时间)间隔内查找非重叠间隔的子系列

时间:2014-10-30 19:54:56

标签: sql algorithm google-bigquery intervals date-range

我有一系列可能重叠的编号时间间隔。重要提示:没有两个时间间隔同时开始,开始时间间隔严格保持

插图:

Task 1:  1111111
Task 2:     22222222222    
Task 3:             333333333333333
Task 4:                 444444
Task 5:                         5555555
Task 6:                                  66
   .
   .
   .
        0 --- time axis --->

间隔表示应执行的任务。我正在寻找一个 SQL查询来选择可以执行的任务,因为只有一个任务可以同时执行 。始终执行第一项任务。接下来,从第一个任务完成后开始的所有任务中,执行最早开始的任务。等等。

结果:可以执行任务1,3和6 。插图:

Task 1:  1111111                             (yes, first)
Task 2:     -----------                      (no, task 1 is running when 2 begins)
Task 3:             333333333333333          (yes)
Task 4:                 ------               (no, task 3 is running when 4 begins)
Task 5:                         -------      (no, task 3 is running when 5 begins)
Task 6:                                  66  (yes)
   .
   .
   .
        0 --- time axis --->

使用迭代,算法很简单:在一个循环中以升序迭代间隔,记住最后一个选定间隔的结束。但是,我想问你一个 SQL查询,可能使用窗口函数,可以执行,例如。在Google BigQuery上。

任务表的架构:

task_id: integer,
start_timestamp: integer,
duration_seconds: integer.

示例数据:

task_id,start_timestamp,duration_seconds 
1,1,7
2,4,11
3,12,15
4,16,6
5,24,7
6,33,2
7,37,4
8,42,13
9,47,3
10,50,2
11,54,21
12,58,14
13,66,8
14,72,7
15,80,6
16,88,16
17,92,14
18,102,3
19,109,2
20,119,10
21,123,13
22,128,21
23,138,7
24,141,17
25,146,9
26,154,17
27,160,17
28,164,13
29,173,21
30,181,7

结果 - 选定的任务:

1,3,6,7,8,12,14,15,16,19,20,23,25,27,30

样本数据说明:

Task  1:  1111111
Task  2:     22222222222
Task  3:             333333333333333
Task  4:                 444444
Task  5:                         5555555
Task  6:                                  66
Task  7:                                      7777
Task  8:                                           8888888888888
Task  9:                                                999
Task 10:                                                   10
Task 11:                                                       11xxxxxxxxxxxxxxxxxxx
Task 12:                                                           12xxxxxxxxxxxx
Task 13:                                                                   13xxxxxx
Task 14:                                                                         14xxxxx
Task 15:                                                                                 15xxxx
Task 16:                                                                                         16xxxxxxxxxxxxxx
Task 17:                                                                                             17xxxxxxxxxxxx
Task 18:                                                                                                       18x
Task 19:                                                                                                              19
Task 20:                                                                                                                        20xxxxxxxx
Task 21:                                                                                                                            21xxxxxxxxxxx
Task 22:                                                                                                                                 22xxxxxxxxxxxxxxxxxxx
Task 23:                                                                                                                                           23xxxxx
Task 24:                                                                                                                                              24xxxxxxxxxxxxxxx
Task 25:                                                                                                                                                   25xxxxxxx
Task 26:                                                                                                                                                           26xxxxxxxxxxxxxxx
Task 27:                                                                                                                                                                 27xxxxxxxxxxxxxxx
Task 28:                                                                                                                                                                     28xxxxxxxxxxx
Task 29:                                                                                                                                                                              29xxxxxxxxxxxxxxxxxxx
Task 30:                                                                                                                                                                                      30xxxxx

非常感谢您的帮助。

2 个答案:

答案 0 :(得分:2)

执行此操作的一种方法如下: 找到重叠的任务(开始时间是在其他任务的开始时间和结束时间之间),而不是提取所有其他任务。

Select task_id
FROM [table]
where Task_id not in(    
    Select B.task_id FROM
    (SELECT task_id, start_timestamp, duration_seconds ,start_timestamp+duration_seconds as end_timestamp
    FROM [table] ) as A
    CROSS JOIN EACH
    (SELECT task_id, start_timestamp, duration_seconds ,start_timestamp+duration_seconds as end_timestamp
    FROM [table] ) as B
    where B.start_timestamp>=A.start_timestamp
    and B.start_timestamp<A.end_timestamp
    and A.task_id<>b.task_id)

此解决方案不使用窗口函数。

使用窗口函数是可行的,但您必须假设并发并行作业的限制(在此示例中为3)。这里我使用LAG窗口函数来查找3个前任任务,并检查某个任务是否与其中一个任务重叠(开始时间介于上一个启动任务的开始时间和结束时间之间)

Select task_id
FROM
(Select task_id, start_timestamp, duration_seconds ,end_timestamp
,LAG(task_id,1) OVER (ORDER BY start_timestamp) as LAG_task_id_1
,LAG(start_timestamp,1) OVER (ORDER BY start_timestamp) as LAG_start_timestamp_1
,LAG(duration_seconds,1) OVER (ORDER BY start_timestamp) as LAG_duration_seconds_1
,LAG(end_timestamp,1) OVER (ORDER BY start_timestamp) as LAG_end_timestamp_1
,LAG(task_id,2) OVER (ORDER BY start_timestamp) as LAG_task_id_2
,LAG(start_timestamp,2) OVER (ORDER BY start_timestamp) as LAG_start_timestamp_2
,LAG(duration_seconds,21) OVER (ORDER BY start_timestamp) as LAG_duration_seconds_2
,LAG(end_timestamp,2) OVER (ORDER BY start_timestamp) as LAG_end_timestamp_2
,LAG(task_id,3) OVER (ORDER BY start_timestamp) as LAG_task_id_3
,LAG(start_timestamp,3) OVER (ORDER BY start_timestamp) as LAG_start_timestamp_3
,LAG(duration_seconds,3) OVER (ORDER BY start_timestamp) as LAG_duration_seconds_3
,LAG(end_timestamp,3) OVER (ORDER BY start_timestamp) as LAG_end_timestamp_3
FROM
(SELECT task_id, start_timestamp, duration_seconds ,start_timestamp+duration_seconds as end_timestamp
FROM [table] ))
where 
(NOT(start_timestamp>=LAG_start_timestamp_1 and start_timestamp<LAG_end_timestamp_1)
and NOT(start_timestamp>=LAG_start_timestamp_2 and start_timestamp<LAG_end_timestamp_2)
and NOT(start_timestamp>=LAG_start_timestamp_3 and start_timestamp<LAG_end_timestamp_3))
OR LAG_start_timestamp_1 IS NULL

希望这会有所帮助......

答案 1 :(得分:0)

刚写了一些非常相关的东西(在oracle中),也许它可以帮助某人。 我在http://technology.amis.nl/2007/05/07/creating-a-gantt-chart-in-sql/

找到了我的解决方案

这是一段有助于说明甘特图模式的代码。

WITH 
PERIODS AS
  ( SELECT  TASK_ID LABEL  ,      
            START_TIMESTAMP START_DATE  ,      
            START_TIMESTAMP+DURATION_SECONDS END_DATE  
    FROM   testet
  ), 
LIMITS AS -- determine the earliest starting date and the latest end date to determine the overall width of the chart
  ( SELECT  MIN(START_DATE) PERIOD_START  ,      
            MAX(END_DATE) PERIOD_END  ,      
            80 WIDTH -- set the width as the number of characters  
    FROM   PERIODS
  ), 
BARS AS
  ( SELECT   LPAD(LABEL, '20')||'|' ACTIVITY  ,        
            (START_DATE - PERIOD_START)/(PERIOD_END - PERIOD_START) * WIDTH FROM_POS, -- the starting position for the bar          
            (END_DATE - PERIOD_START)/(PERIOD_END - PERIOD_START)   * WIDTH TO_POS   -- the end position for the bar  
    FROM     PERIODS  ,        LIMITS
  ) 
SELECT  ACTIVITY||        
        LPAD('I',FROM_POS)         ||
        RPAD('-', TO_POS - FROM_POS, '-')         ||
        'I' GANTT
FROM     BARS
UNION ALL
SELECT RPAD('_',WIDTH + 22,'_')
FROM   LIMITS
UNION ALL
SELECT  LPAD('|',21)       ||
        PERIOD_START       ||
        LPAD(PERIOD_END, 
        WIDTH - 11)
FROM   LIMITS;

输出:

                   1|--I
                   2|I----I
                   3|   I------I
                   4|     I--I
                   5|        I--I
                   6|            II
                   7|             I-I
                   8|               I-----I
                   9|                  I-I
                  10|                   II
                  11|                    I--------I
                  12|                      I-----I
                  13|                         I---I
                  14|                            I--I
                  15|                               I--I
                  16|                                   I------I
                  17|                                    I-----I
                  18|                                        I-I
                  19|                                           II
                  20|                                               I----I
                  21|                                                 I-----I
                  22|                                                   I--------I
                  23|                                                       I--I
                  24|                                                         I-------I
                  25|                                                           I---I
                  26|                                                              I-------I
                  27|                                                                I-------I
                  28|                                                                  I-----I
                  29|                                                                      I--------I
                  30|                                                                         I--I
______________________________________________________________________________________________________
                    |1                                                                  194