Oracle查找和重写连续行

时间:2018-10-28 17:11:36

标签: sql oracle gaps-and-islands

我有一个表,其中包含类似;

的行
ID  DATE
1   1.01.2018 13:30
1   1.01.2018 13:31
2   1.01.2018 13:32
2   1.01.2018 13:33
1   1.01.2018 13:34
3   1.01.2018 13:35
3   1.01.2018 13:35
3   1.01.2018 13:35
3   1.01.2018 13:36
1   1.01.2018 13:37
3   1.01.2018 13:38
4   1.01.2018 13:39
4   1.01.2018 13:40
1   1.01.2018 13:40

我想找到事件的开始和结束日期。

所需的输出;

ID    START_DATE              END_DATE
1   1.01.2018 13:30     1.01.2018 13:31
2   1.01.2018 13:32     1.01.2018 13:33
1   1.01.2018 13:34     1.01.2018 13:34
3   1.01.2018 13:35     1.01.2018 13:36
1   1.01.2018 13:37     1.01.2018 13:37
3   1.01.2018 13:38     1.01.2018 13:38
4   1.01.2018 13:39     1.01.2018 13:40
1   1.01.2018 13:40     1.01.2018 13:40

在订购日期如果同一ID仍在

  • 开始日期=它是第一个日期

  • 结束日期=直到ID更改为止的最后日期

如何编写此查询?

谢谢。

4 个答案:

答案 0 :(得分:4)

这是一个孤岛问题。对于此版本,我建议使用行号的不同之处:

select id, min(date), max(date)
from (select t.*,
             row_number() over (order by date) as seqnum,
             row_number() over (partition by id order by date) as seqnum_i
      from t
     ) t
group by id, (seqnum - seqnum_i);

如上所述,此问题是不确定的,因为日期时间值中有联系。以下内容即将结束:

select id, min(dte), max(dte)
from (select t.*,
             row_number() over (order by dte) as seqnum,
             row_number() over (partition by id order by dte) as seqnum_i
      from (select distinct id, dte from t) t
     ) t
group by id, (seqnum - seqnum_i)

(请参阅db <> fiddle here)由于这个问题,db <> fiddle有两行代表“ 4”。

但是联系使问题不确定。

编辑:

太好了。这些重复使这成为一个难题。使用窗口功能可以解决。关键思想是将ID的先前日期与数据中的先前日期进行比较。定义了组。

所以:

select id, min(dte), max(dte)
from (select t.*,
             sum(case when prev_id_dte = prev_dte then 0 else 1 end) over (partition by id order by dte) as grp
      from (select t.*,
                   lag(dte) over (partition by id order by dte) as prev_id_dte,
                   (select max(dte) from t t2 where t2.dte < t.dte) as prev_dte
            from (select distinct id, dte
                  from t
                 ) t
           ) t
     ) t
group by id, grp;

Here是此版本的db <>小提琴。

我对子查询不感到兴奋。但是我认为没有简单的方法可以使用窗口函数从分组数据中获取先前的值。有一种方法,但是需要多层嵌套。子查询更简单。

答案 1 :(得分:2)

由于日期/时间值有多个行,因此行的顺序不清楚。因此,我决定按日期/日期+ ID进行订购。

注意:由于date是Oracle中的保留字,因此我将列名从d更改为DATE

如果您的数据是:

create table t (
  id number(6),
  d date
);

insert into t (id, d) values (1, timestamp '2018-01-01 13:30:00');
insert into t (id, d) values (1, timestamp '2018-01-01 13:31:00');
insert into t (id, d) values (2, timestamp '2018-01-01 13:32:00');
insert into t (id, d) values (2, timestamp '2018-01-01 13:33:00');
insert into t (id, d) values (1, timestamp '2018-01-01 13:34:00');
insert into t (id, d) values (3, timestamp '2018-01-01 13:35:00');
insert into t (id, d) values (3, timestamp '2018-01-01 13:35:00');
insert into t (id, d) values (3, timestamp '2018-01-01 13:35:00');
insert into t (id, d) values (3, timestamp '2018-01-01 13:36:00');
insert into t (id, d) values (1, timestamp '2018-01-01 13:37:00');
insert into t (id, d) values (3, timestamp '2018-01-01 13:38:00');
insert into t (id, d) values (4, timestamp '2018-01-01 13:39:00');
insert into t (id, d) values (4, timestamp '2018-01-01 13:40:00');
insert into t (id, d) values (1, timestamp '2018-01-01 13:40:00');

针对您的查询的解决方案可能是:

with x as (
select
    t.*,
    case when id = lag(id) over(order by d, id) then 0 else 1 end as ini,
    case when id = lead(id) over(order by d, id) then 0 else 1 end as fin
  from t  
),
y as (
select * from x where ini <> 0 or fin <> 0
)
select
    id,
    d as start_date,
    case when fin = 1 then d else lead(d) over (order by d, id) end as end_date
  from y where ini = 1

结果:

ID  START_DATE             END_DATE
--  ---------------------  ---------------------
1   2018-01-01 13:30:00.0  2018-01-01 13:32:00.0
2   2018-01-01 13:32:00.0  2018-01-01 13:34:00.0
1   2018-01-01 13:34:00.0  2018-01-01 13:34:00.0
3   2018-01-01 13:35:00.0  2018-01-01 13:37:00.0
1   2018-01-01 13:37:00.0  2018-01-01 13:37:00.0
3   2018-01-01 13:38:00.0  2018-01-01 13:38:00.0
4   2018-01-01 13:39:00.0  2018-01-01 13:39:00.0
1   2018-01-01 13:40:00.0  2018-01-01 13:40:00.0
4   2018-01-01 13:40:00.0  2018-01-01 13:40:00.0

答案 2 :(得分:2)

这也可以通过模式匹配来实现。

SELECT THE_ID,
       TO_CHAR(MIN_DATE , 'MM.DD.YYYY HH24:MI:SS') AS START_DATE,
       TO_CHAR(MAX_DATE , 'MM.DD.YYYY HH24:MI:SS') AS END_DATE
FROM T
       MATCH_RECOGNIZE (
         ORDER BY "DATE"
         MEASURES
           ID AS THE_ID,
           MIN("DATE") AS MIN_DATE,
           MAX("DATE") AS MAX_DATE
         ONE ROW PER MATCH
         AFTER MATCH SKIP PAST LAST ROW
         PATTERN (IN_RUN{0,} END_RUN )
         DEFINE
           IN_RUN AS (ID = NEXT(ID)),
           END_RUN AS ID != ANY (NEXT(ID) , PREV(ID)))
ORDER BY START_DATE ASC, END_DATE ASC;

结果:

    THE_ID START_DATE          END_DATE
---------- ------------------- -------------------
     1 01.01.2018 13:30:00 01.01.2018 13:31:00
     2 01.01.2018 13:32:00 01.01.2018 13:33:00
     1 01.01.2018 13:34:00 01.01.2018 13:34:00
     3 01.01.2018 13:35:00 01.01.2018 13:36:00
     1 01.01.2018 13:37:00 01.01.2018 13:37:00
     3 01.01.2018 13:38:00 01.01.2018 13:38:00
     4 01.01.2018 13:39:00 01.01.2018 13:40:00
     1 01.01.2018 13:40:00 01.01.2018 13:40:00

8 rows selected.

答案 3 :(得分:1)

您可以使用窗口功能逐步构建答案。

步骤1-按时间戳对行进行排序,并使用LEAD确定每个“组”何时结束。也就是说,当id的值在下一行更改时。在正确的地方将任何行标记为“ Y”。

第2步-在当前行之前计算标记的“ Y”值。此计数将是“组号”。这样,每个具有相同ID的连续组都将具有相同的“组号”。

第3步-现在,将每个“组号”中的minmax时间戳记作为该事件的开始和结束时间。

它可能不像其他可能的解决方案那么紧凑和酷,但是当我在6个月后重新使用它时,我有更大的机会记住它的工作原理。这就是我。

这是全部。

WITH input (id, ts) AS (
SELECT 1, TO_DATE(  '01.01.2018 13:30','DD.MM.YYYY HH24:MI') FROM DUAL UNION ALL
SELECT 1, TO_DATE(  '01.01.2018 13:31','DD.MM.YYYY HH24:MI') FROM DUAL UNION ALL
SELECT 2, TO_DATE(  '01.01.2018 13:32','DD.MM.YYYY HH24:MI') FROM DUAL UNION ALL
SELECT 2, TO_DATE(  '01.01.2018 13:33','DD.MM.YYYY HH24:MI') FROM DUAL UNION ALL
SELECT 1, TO_DATE(  '01.01.2018 13:34','DD.MM.YYYY HH24:MI') FROM DUAL UNION ALL
SELECT 3, TO_DATE(  '01.01.2018 13:35','DD.MM.YYYY HH24:MI') FROM DUAL UNION ALL
SELECT 3, TO_DATE(  '01.01.2018 13:35','DD.MM.YYYY HH24:MI') FROM DUAL UNION ALL
SELECT 3, TO_DATE(  '01.01.2018 13:35','DD.MM.YYYY HH24:MI') FROM DUAL UNION ALL
SELECT 3, TO_DATE(  '01.01.2018 13:36','DD.MM.YYYY HH24:MI') FROM DUAL UNION ALL
SELECT 1, TO_DATE(  '01.01.2018 13:37','DD.MM.YYYY HH24:MI') FROM DUAL UNION ALL
SELECT 3, TO_DATE(  '01.01.2018 13:38','DD.MM.YYYY HH24:MI') FROM DUAL UNION ALL
SELECT 4, TO_DATE(  '01.01.2018 13:39','DD.MM.YYYY HH24:MI') FROM DUAL UNION ALL
SELECT 4, TO_DATE(  '01.01.2018 13:40','DD.MM.YYYY HH24:MI') FROM DUAL UNION ALL
SELECT 1, TO_DATE(  '01.01.2018 13:40','DD.MM.YYYY HH24:MI') FROM DUAL ), 
-- Solution starts here
input_with_group_markers as (
SELECT id, ts,
case when lead(id,1) over ( order by ts ) != id THEN 'Y' ELSE NULL END last_row_in_group
FROM input
),
grouped_input as (
SELECT igwm.*, count(last_row_in_group) OVER ( order by ts rows between unbounded preceding and 1 preceding ) group_number
FROM input_with_group_markers igwm )
SELECT min(id) id, 
       to_char(min(ts),'DD.MM.YYYY HH24:MI') event_start, 
       to_char(max(ts),'DD.MM.YYYY HH24:MI') event_end
FROM grouped_input
group by group_number
order by group_number;