加入时间段,按ID分组

时间:2018-12-27 16:10:58

标签: sql oracle oracle11g gaps-and-islands

我有行的时间段相交于同一用户。例如:

-------------------------------------------------------------
|    ID_USER    |     START_DATE      |      END_DATE       |
-------------------------------------------------------------
|       1       | 01/01/2018 08:00:00 | 01/01/2018 08:50:00 |
|       1       | 01/01/2018 08:15:00 | 01/01/2018 08:20:00 |
|       1       | 01/01/2018 08:45:00 | 01/01/2018 09:55:00 |
|       1       | 01/01/2018 15:45:00 | 01/01/2018 17:00:00 |
|       2       | 01/01/2018 08:45:00 | 01/01/2018 09:50:00 |
|       2       | 01/01/2018 09:15:00 | 01/01/2018 10:00:00 |
-------------------------------------------------------------

我想避免它。我想将一行合并到一列中,以开始日期为最早的日期和结束日期为最新的日期。上面示例的结果将是:

-------------------------------------------------------------
|    ID_USER    |     START_DATE      |      END_DATE       |
-------------------------------------------------------------
|       1       | 01/01/2018 08:00:00 | 01/01/2018 09:55:00 |
|       1       | 01/01/2018 15:45:00 | 01/01/2018 17:00:00 |
|       2       | 01/01/2018 08:45:00 | 01/01/2018 10:00:00 |
-------------------------------------------------------------

您知道如何在Oracle中使用SQL语句获得所需的解决方案吗?

3 个答案:

答案 0 :(得分:3)

您有两种类型的交叉点:第一个周期完全存在于另一个周期内(例如,第二行08:15-08:20),第二个周期一个周期与另一个周期的开始或结束重叠。

如果消除第一种类型,则可以使用超前和滞后来偷看剩余的内容;我添加了第三个数据集以获得更多乐趣:

select id_user, start_date, end_date,
  case when start_date <= lag(end_date) over (partition by id_user order by start_date)
       then null
       else start_date
  end as calc_start_date,
  case when end_date >= lead(start_date) over (partition by id_user order by end_date)
       then null
       else end_date
  end as calc_end_date
from your_table t1
where not exists (
    select *
    from your_table t2
    where t2.id_user = t1.id_user
    and t2.start_date <= t1.start_date and t2.end_date >= t1.end_date
    and t2.rowid != t1.rowid
);
   ID_USER START_DATE          END_DATE            CALC_START_DATE     CALC_END_DATE         
---------- ------------------- ------------------- ------------------- ----------------------
         1 2018-01-01 08:00:00 2018-01-01 08:50:00 2018-01-01 08:00:00                       
         1 2018-01-01 08:45:00 2018-01-01 09:55:00                     2018-01-01 09:55:00   
         1 2018-01-01 15:45:00 2018-01-01 17:00:00 2018-01-01 15:45:00 2018-01-01 17:00:00   
         2 2018-01-01 08:45:00 2018-01-01 09:50:00 2018-01-01 08:45:00                       
         2 2018-01-01 09:15:00 2018-01-01 10:00:00                     2018-01-01 10:00:00   
         3 2018-01-01 08:00:00 2018-01-01 08:30:00 2018-01-01 08:00:00                       
         3 2018-01-01 08:15:00 2018-01-01 08:45:00                                           
         3 2018-01-01 08:45:00 2018-01-01 09:15:00                                           
         3 2018-01-01 09:00:00 2018-01-01 09:30:00                     2018-01-01 09:30:00   

not exists子句删除了第一种类型。

然后,您可以折叠剩下的内容,首先消除两端重叠的行(在ID 3的额外行中),它​​们的超前和滞后值都为空;然后再次使用超前和滞后将剩余的null替换为其相邻行的值:

select distinct id_user,
  case when calc_start_date is null
       then lag(calc_start_date) over (partition by id_user order by start_date)
       else calc_start_date
  end as start_date,
  case when calc_end_date is null
       then lead(calc_end_date) over (partition by id_user order by end_date)
       else calc_end_date
  end as end_date
from (
  select id_user, start_date, end_date,
    case when start_date <= lag(end_date) over (partition by id_user order by start_date)
         then null
         else start_date
    end as calc_start_date,
    case when end_date >= lead(start_date) over (partition by id_user order by end_date)
         then null
         else end_date
   end as calc_end_date
  from your_table t1
  where not exists (
      select *
      from your_table t2
      where t2.id_user = t1.id_user
      and t2.start_date <= t1.start_date and t2.end_date >= t1.end_date
      and t2.rowid != t1.rowid
  )
)
where calc_start_date is not null
or calc_end_date is not null
order by id_user, start_date, end_date;
   ID_USER START_DATE          END_DATE           
---------- ------------------- -------------------
         1 2018-01-01 08:00:00 2018-01-01 09:55:00
         1 2018-01-01 15:45:00 2018-01-01 17:00:00
         2 2018-01-01 08:45:00 2018-01-01 10:00:00
         3 2018-01-01 08:00:00 2018-01-01 09:30:00

如果我还没有考虑到可能导致问题的极端情况,这不会完全让我感到惊讶,但是希望无论如何这将是一个起点。

答案 1 :(得分:1)

获取结果需要执行四个步骤,其中三个子查询和一个主查询代表了这一点:

1)增加END_DATE至目前为止的最高水平

这是必需的,因为未订购您的END_DATE,例如第一条记录与第三条记录相交,但是第二条记录与第三条记录不相交。

   ID_USER START_DATE          END_DATE          
---------- ------------------- -------------------
         1 01.01.2018 08:00:00 01.01.2018 08:50:00 
         1 01.01.2018 08:15:00 01.01.2018 08:50:00 
         1 01.01.2018 08:45:00 01.01.2018 09:55:00 
         1 01.01.2018 15:45:00 01.01.2018 17:00:00 
         2 01.01.2018 08:45:00 01.01.2018 09:50:00 
         2 01.01.2018 09:15:00 01.01.2018 10:00:00 

2)为每个不重叠的块定义一个新组

从技术上讲,对于第一条记录(每个USER_ID)以及与ist前身不重叠的每条记录-分配一个新的group_id(GRP

    ID_USER START_DATE          END_DATE                   GRP
---------- ------------------- ------------------- ----------
         1 01.01.2018 08:00:00 01.01.2018 08:50:00          1 
         1 01.01.2018 08:15:00 01.01.2018 08:50:00            
         1 01.01.2018 08:45:00 01.01.2018 09:55:00            
         1 01.01.2018 15:45:00 01.01.2018 17:00:00          4 
         2 01.01.2018 08:45:00 01.01.2018 09:50:00          1 
         2 01.01.2018 09:15:00 01.01.2018 10:00:00         

3)填写组

用分配的最后一个组ID填充NULL,以启用GROUP BY。

   ID_USER START_DATE          END_DATE                  GRP2
---------- ------------------- ------------------- ----------
         1 01.01.2018 08:00:00 01.01.2018 08:50:00          1 
         1 01.01.2018 08:15:00 01.01.2018 08:50:00          1 
         1 01.01.2018 08:45:00 01.01.2018 09:55:00          1 
         1 01.01.2018 15:45:00 01.01.2018 17:00:00          4 
         2 01.01.2018 08:45:00 01.01.2018 09:50:00          1 
         2 01.01.2018 09:15:00 01.01.2018 10:00:00          1  

4)GROUP BY

其余的很简单,组中的日期是MIN和MAX。您将kay(ID_USER)分组为{strong>和 GRP

   ID_USER START_DATE          END_DATE          
---------- ------------------- -------------------
         1 01.01.2018 08:00:00 01.01.2018 09:55:00 
         1 01.01.2018 15:45:00 01.01.2018 17:00:00 
         2 01.01.2018 08:45:00 01.01.2018 10:00:00  

查询

with myt1 as (
select ID_USER, START_DATE, 
max(END_DATE) over (partition by ID_USER order by START_DATE) END_DATE
from my_table),
myt2 as (
select ID_USER,START_DATE, END_DATE,
case when (nvl(lag(END_DATE) over (partition by ID_USER order by START_DATE),START_DATE-1) < START_DATE ) then 
     row_number() over (partition by ID_USER order by START_DATE) end grp
from myt1 
), 
myt3 as (
select ID_USER,START_DATE, END_DATE,
last_value(grp ignore nulls) over (partition by ID_USER order by START_DATE) as grp2
from myt2
),
select
ID_USER, 
min(START_DATE) START_DATE, 
max(END_DATE) END_DATE
from myt3
group by ID_USER, GRP2
order by 1,2;

数据

create table my_table as 
select      1 ID_USER,   to_date('01/01/2018 08:00:00','dd/mm/yyyy hh24:mi:ss') START_DATE, to_date('01/01/2018 08:50:00','dd/mm/yyyy hh24:mi:ss') END_DATE from dual union all
select      1 ID_USER,   to_date('01/01/2018 08:15:00','dd/mm/yyyy hh24:mi:ss') START_DATE, to_date('01/01/2018 08:20:00','dd/mm/yyyy hh24:mi:ss') END_DATE from dual union all
select      1 ID_USER,   to_date('01/01/2018 08:45:00','dd/mm/yyyy hh24:mi:ss') START_DATE, to_date('01/01/2018 09:55:00','dd/mm/yyyy hh24:mi:ss') END_DATE from dual union all
select      1 ID_USER,   to_date('01/01/2018 15:45:00','dd/mm/yyyy hh24:mi:ss') START_DATE, to_date('01/01/2018 17:00:00','dd/mm/yyyy hh24:mi:ss') END_DATE from dual union all
select      2 ID_USER,   to_date('01/01/2018 08:45:00','dd/mm/yyyy hh24:mi:ss') START_DATE, to_date('01/01/2018 09:50:00','dd/mm/yyyy hh24:mi:ss') END_DATE from dual union all
select      2 ID_USER,   to_date('01/01/2018 09:15:00','dd/mm/yyyy hh24:mi:ss') START_DATE, to_date('01/01/2018 10:00:00','dd/mm/yyyy hh24:mi:ss') END_DATE from dual;

答案 2 :(得分:0)

您正在寻找MIN / MAX函数:

SELECT MIN(aggregate_expression),MAX(aggregate_expression)
FROM tables
[WHERE conditions]
GROUP BY ID;

参考: https://www.techonthenet.com/oracle/functions/min.php