我有行的时间段相交于同一用户。例如:
-------------------------------------------------------------
| ID_USER | START_DATE | END_DATE |
-------------------------------------------------------------
| 1 | 01/01/2018 08:00:00 | 01/01/2018 08:50:00 |
| 1 | 01/01/2018 08:15:00 | 01/01/2018 08:20:00 |
| 1 | 01/01/2018 08:45:00 | 01/01/2018 09:55:00 |
| 1 | 01/01/2018 15:45:00 | 01/01/2018 17:00:00 |
| 2 | 01/01/2018 08:45:00 | 01/01/2018 09:50:00 |
| 2 | 01/01/2018 09:15:00 | 01/01/2018 10:00:00 |
-------------------------------------------------------------
我想避免它。我想将一行合并到一列中,以开始日期为最早的日期和结束日期为最新的日期。上面示例的结果将是:
-------------------------------------------------------------
| ID_USER | START_DATE | END_DATE |
-------------------------------------------------------------
| 1 | 01/01/2018 08:00:00 | 01/01/2018 09:55:00 |
| 1 | 01/01/2018 15:45:00 | 01/01/2018 17:00:00 |
| 2 | 01/01/2018 08:45:00 | 01/01/2018 10:00:00 |
-------------------------------------------------------------
您知道如何在Oracle中使用SQL语句获得所需的解决方案吗?
答案 0 :(得分:3)
您有两种类型的交叉点:第一个周期完全存在于另一个周期内(例如,第二行08:15-08:20),第二个周期一个周期与另一个周期的开始或结束重叠。
如果消除第一种类型,则可以使用超前和滞后来偷看剩余的内容;我添加了第三个数据集以获得更多乐趣:
select id_user, start_date, end_date,
case when start_date <= lag(end_date) over (partition by id_user order by start_date)
then null
else start_date
end as calc_start_date,
case when end_date >= lead(start_date) over (partition by id_user order by end_date)
then null
else end_date
end as calc_end_date
from your_table t1
where not exists (
select *
from your_table t2
where t2.id_user = t1.id_user
and t2.start_date <= t1.start_date and t2.end_date >= t1.end_date
and t2.rowid != t1.rowid
);
ID_USER START_DATE END_DATE CALC_START_DATE CALC_END_DATE
---------- ------------------- ------------------- ------------------- ----------------------
1 2018-01-01 08:00:00 2018-01-01 08:50:00 2018-01-01 08:00:00
1 2018-01-01 08:45:00 2018-01-01 09:55:00 2018-01-01 09:55:00
1 2018-01-01 15:45:00 2018-01-01 17:00:00 2018-01-01 15:45:00 2018-01-01 17:00:00
2 2018-01-01 08:45:00 2018-01-01 09:50:00 2018-01-01 08:45:00
2 2018-01-01 09:15:00 2018-01-01 10:00:00 2018-01-01 10:00:00
3 2018-01-01 08:00:00 2018-01-01 08:30:00 2018-01-01 08:00:00
3 2018-01-01 08:15:00 2018-01-01 08:45:00
3 2018-01-01 08:45:00 2018-01-01 09:15:00
3 2018-01-01 09:00:00 2018-01-01 09:30:00 2018-01-01 09:30:00
not exists
子句删除了第一种类型。
然后,您可以折叠剩下的内容,首先消除两端重叠的行(在ID 3的额外行中),它们的超前和滞后值都为空;然后再次使用超前和滞后将剩余的null替换为其相邻行的值:
select distinct id_user,
case when calc_start_date is null
then lag(calc_start_date) over (partition by id_user order by start_date)
else calc_start_date
end as start_date,
case when calc_end_date is null
then lead(calc_end_date) over (partition by id_user order by end_date)
else calc_end_date
end as end_date
from (
select id_user, start_date, end_date,
case when start_date <= lag(end_date) over (partition by id_user order by start_date)
then null
else start_date
end as calc_start_date,
case when end_date >= lead(start_date) over (partition by id_user order by end_date)
then null
else end_date
end as calc_end_date
from your_table t1
where not exists (
select *
from your_table t2
where t2.id_user = t1.id_user
and t2.start_date <= t1.start_date and t2.end_date >= t1.end_date
and t2.rowid != t1.rowid
)
)
where calc_start_date is not null
or calc_end_date is not null
order by id_user, start_date, end_date;
ID_USER START_DATE END_DATE
---------- ------------------- -------------------
1 2018-01-01 08:00:00 2018-01-01 09:55:00
1 2018-01-01 15:45:00 2018-01-01 17:00:00
2 2018-01-01 08:45:00 2018-01-01 10:00:00
3 2018-01-01 08:00:00 2018-01-01 09:30:00
如果我还没有考虑到可能导致问题的极端情况,这不会完全让我感到惊讶,但是希望无论如何这将是一个起点。
答案 1 :(得分:1)
获取结果需要执行四个步骤,其中三个子查询和一个主查询代表了这一点:
1)增加END_DATE至目前为止的最高水平
这是必需的,因为未订购您的END_DATE
,例如第一条记录与第三条记录相交,但是第二条记录与第三条记录不相交。
ID_USER START_DATE END_DATE
---------- ------------------- -------------------
1 01.01.2018 08:00:00 01.01.2018 08:50:00
1 01.01.2018 08:15:00 01.01.2018 08:50:00
1 01.01.2018 08:45:00 01.01.2018 09:55:00
1 01.01.2018 15:45:00 01.01.2018 17:00:00
2 01.01.2018 08:45:00 01.01.2018 09:50:00
2 01.01.2018 09:15:00 01.01.2018 10:00:00
2)为每个不重叠的块定义一个新组
从技术上讲,对于第一条记录(每个USER_ID)以及与ist前身不重叠的每条记录-分配一个新的group_id(GRP
)
ID_USER START_DATE END_DATE GRP
---------- ------------------- ------------------- ----------
1 01.01.2018 08:00:00 01.01.2018 08:50:00 1
1 01.01.2018 08:15:00 01.01.2018 08:50:00
1 01.01.2018 08:45:00 01.01.2018 09:55:00
1 01.01.2018 15:45:00 01.01.2018 17:00:00 4
2 01.01.2018 08:45:00 01.01.2018 09:50:00 1
2 01.01.2018 09:15:00 01.01.2018 10:00:00
3)填写组
用分配的最后一个组ID填充NULL
,以启用GROUP BY。
ID_USER START_DATE END_DATE GRP2
---------- ------------------- ------------------- ----------
1 01.01.2018 08:00:00 01.01.2018 08:50:00 1
1 01.01.2018 08:15:00 01.01.2018 08:50:00 1
1 01.01.2018 08:45:00 01.01.2018 09:55:00 1
1 01.01.2018 15:45:00 01.01.2018 17:00:00 4
2 01.01.2018 08:45:00 01.01.2018 09:50:00 1
2 01.01.2018 09:15:00 01.01.2018 10:00:00 1
4)GROUP BY
其余的很简单,组中的日期是MIN和MAX。您将kay(ID_USER
)分组为{strong>和 GRP
。
ID_USER START_DATE END_DATE
---------- ------------------- -------------------
1 01.01.2018 08:00:00 01.01.2018 09:55:00
1 01.01.2018 15:45:00 01.01.2018 17:00:00
2 01.01.2018 08:45:00 01.01.2018 10:00:00
查询
with myt1 as (
select ID_USER, START_DATE,
max(END_DATE) over (partition by ID_USER order by START_DATE) END_DATE
from my_table),
myt2 as (
select ID_USER,START_DATE, END_DATE,
case when (nvl(lag(END_DATE) over (partition by ID_USER order by START_DATE),START_DATE-1) < START_DATE ) then
row_number() over (partition by ID_USER order by START_DATE) end grp
from myt1
),
myt3 as (
select ID_USER,START_DATE, END_DATE,
last_value(grp ignore nulls) over (partition by ID_USER order by START_DATE) as grp2
from myt2
),
select
ID_USER,
min(START_DATE) START_DATE,
max(END_DATE) END_DATE
from myt3
group by ID_USER, GRP2
order by 1,2;
数据
create table my_table as
select 1 ID_USER, to_date('01/01/2018 08:00:00','dd/mm/yyyy hh24:mi:ss') START_DATE, to_date('01/01/2018 08:50:00','dd/mm/yyyy hh24:mi:ss') END_DATE from dual union all
select 1 ID_USER, to_date('01/01/2018 08:15:00','dd/mm/yyyy hh24:mi:ss') START_DATE, to_date('01/01/2018 08:20:00','dd/mm/yyyy hh24:mi:ss') END_DATE from dual union all
select 1 ID_USER, to_date('01/01/2018 08:45:00','dd/mm/yyyy hh24:mi:ss') START_DATE, to_date('01/01/2018 09:55:00','dd/mm/yyyy hh24:mi:ss') END_DATE from dual union all
select 1 ID_USER, to_date('01/01/2018 15:45:00','dd/mm/yyyy hh24:mi:ss') START_DATE, to_date('01/01/2018 17:00:00','dd/mm/yyyy hh24:mi:ss') END_DATE from dual union all
select 2 ID_USER, to_date('01/01/2018 08:45:00','dd/mm/yyyy hh24:mi:ss') START_DATE, to_date('01/01/2018 09:50:00','dd/mm/yyyy hh24:mi:ss') END_DATE from dual union all
select 2 ID_USER, to_date('01/01/2018 09:15:00','dd/mm/yyyy hh24:mi:ss') START_DATE, to_date('01/01/2018 10:00:00','dd/mm/yyyy hh24:mi:ss') END_DATE from dual;
答案 2 :(得分:0)
您正在寻找MIN / MAX函数:
SELECT MIN(aggregate_expression),MAX(aggregate_expression)
FROM tables
[WHERE conditions]
GROUP BY ID;