我具有以下格式的数据集-
ID START_TIME END_TIME VAL
1 30-APR-2018 00:00:00 01-MAY-2018 00:00:00 423
2 01-MAY-2018 00:00:00 01-MAY-2018 17:15:00 455
3 01-MAY-2018 17:15:00 03-MAY-2018 00:00:00 455
预期输出-
该数据集应细分为30分钟的间隔值,但是,如果记录不在“ 00”或“ 30”分钟点,则应将其视为此过程的一部分(如带有START_TIME / END_TIME = '17:15:00')
ID START_TIME END_TIME VAL
1 30-APR-2018 00:00:00 30-APR-2018 00:30:00 423
1 30-APR-2018 00:30:00 30-APR-2018 01:00:00 423
1 30-APR-2018 01:00:00 30-APR-2018 01:30:00 423
..
..
..
1 30-APR-2018 23:00:00 30-APR-2018 23:30:00 423
1 30-APR-2018 23:30:00 01-MAY-2018 00:00:00 423
2 01-MAY-2018 00:00:00 01-MAY-2018 00:30:00 455
2 01-MAY-2018 00:30:00 01-MAY-2018 01:00:00 455
..
..
..
..
2 01-MAY-2018 16:30:00 01-MAY-2018 17:00:00 455
2 01-MAY-2018 17:00:00 01-MAY-2018 17:15:00 455
3 01-MAY-2018 17:15:00 03-MAY-2018 17:30:00 455
3 01-MAY-2018 17:30:00 03-MAY-2018 18:00:00 455
..
..
..
3 02-MAY-2018 23:00:00 02-MAY-2018 23:30:00 455
3 02-MAY-2018 23:30:00 03-MAY-2018 00:00:00 455
到目前为止我尝试过的-
CREATE TABLE TESTT
(
ID NUMBER(8,3),
START_TIME DATE,
END_TIME DATE,
VAL NUMBER(8,3)
);
INSERT INTO TESTT VALUES (1, TO_DATE('30-APR-2018 00:00:00','DD-MON-YYYY HH24:MI:SS'), TO_DATE('01-MAY-2018 00:00:00','DD-MON-YYYY HH24:MI:SS'), 423);
INSERT INTO TESTT VALUES (2, TO_DATE('01-MAY-2018 00:00:00','DD-MON-YYYY HH24:MI:SS'), TO_DATE('01-MAY-2018 17:15:00','DD-MON-YYYY HH24:MI:SS'), 455);
INSERT INTO TESTT VALUES (3, TO_DATE('01-MAY-2018 17:15:00','DD-MON-YYYY HH24:MI:SS'), TO_DATE('03-MAY-2018 00:00:00','DD-MON-YYYY HH24:MI:SS'), 455);
COMMIT;
CREATE TABLE TESTT_OUTPUT AS
SELECT * FROM TESTT WHERE 1=2;
CREATE SEQUENCE TESTT_SEQ MINVALUE 1 MAXVALUE 9999999999999999999999999999 INCREMENT BY 1 START WITH 1 NOCACHE NOORDER NOCYCLE NOPARTITION;
BEGIN
FOR R IN (SELECT * FROM TESTT)
LOOP
INSERT INTO TESTT_OUTPUT(id, START_TIME, END_TIME, VAL)
SELECT TESTT_SEQ.nextval, R.START_TIME + (LEVEL - 1)/48 AS START_TIME, R.START_TIME + LEVEL/48 AS END_TIME, R.VAL FROM
DUAL
CONNECT BY LEVEL <= ROUND((R.END_TIME - R.START_TIME)*48);
COMMIT;
END LOOP;
END;
/
SELECT * FROM TESTT_OUTPUT;
1 30-APR-2018 00:00:00 30-APR-2018 00:30:00 423
2 30-APR-2018 00:30:00 30-APR-2018 01:00:00 423
3 30-APR-2018 01:00:00 30-APR-2018 01:30:00 423
..
..
..
47 30-APR-2018 23:00:00 30-APR-2018 23:30:00 423
48 30-APR-2018 23:30:00 01-MAY-2018 00:00:00 423
49 01-MAY-2018 00:00:00 01-MAY-2018 00:30:00 455
50 01-MAY-2018 00:30:00 01-MAY-2018 01:00:00 455
..
..
..
82 01-MAY-2018 16:30:00 01-MAY-2018 17:00:00 455
83 01-MAY-2018 17:00:00 01-MAY-2018 17:30:00 455
84 01-MAY-2018 17:15:00 01-MAY-2018 17:45:00 455
85 01-MAY-2018 17:45:00 01-MAY-2018 18:15:00 455
86 01-MAY-2018 18:15:00 01-MAY-2018 18:45:00 455
87 01-MAY-2018 18:45:00 01-MAY-2018 19:15:00 455
..
..
..
141 02-MAY-2018 21:45:00 02-MAY-2018 22:15:00 455
142 02-MAY-2018 22:15:00 02-MAY-2018 22:45:00 455
143 02-MAY-2018 22:45:00 02-MAY-2018 23:15:00 455
144 02-MAY-2018 23:15:00 02-MAY-2018 23:45:00 455
145 02-MAY-2018 23:45:00 03-MAY-2018 00:15:00 455
通过这种方法,分钟值除“ 00”或“ 30”以外的任何数据仍将通过添加30分钟的方式进行相同处理,并且最终结果中没有“ 00”的时间点数据或“ 30”分钟的值。
希望这很有道理。
任何有关如何以预期格式转换数据的输入将非常有帮助。谢谢!
答案 0 :(得分:2)
这似乎不太雅致,但这;
select id,
greatest(start_time,
adj_start_time + numtodsinterval(30 * (level - 1), 'MINUTE')) as start_time,
least(end_time,
adj_start_time + numtodsinterval(30 * level, 'MINUTE')) as end_time
from (
select id,
start_time,
end_time,
trunc(start_time, 'HH')
+ numtodsinterval(
case when extract(minute from cast(start_time as timestamp)) < 30 then 0
else 30
end, 'MINUTE') as adj_start_time
from testt
)
connect by level <= ceil((end_time - start_time - 1/86400) / (30/1440))
and prior id = id
and prior dbms_random.value is not null
order by id, start_time;
似乎获得所需的结果,生成145行:
ID START_TIME END_TIME
---------- ------------------- -------------------
1 2018-04-30 00:00:00 2018-04-30 00:30:00
1 2018-04-30 00:30:00 2018-04-30 01:00:00
1 2018-04-30 01:00:00 2018-04-30 01:30:00
...
1 2018-04-30 22:30:00 2018-04-30 23:00:00
1 2018-04-30 23:00:00 2018-04-30 23:30:00
1 2018-04-30 23:30:00 2018-05-01 00:00:00
2 2018-05-01 00:00:00 2018-05-01 00:30:00
2 2018-05-01 00:30:00 2018-05-01 01:00:00
2 2018-05-01 01:00:00 2018-05-01 01:30:00
...
2 2018-05-01 16:00:00 2018-05-01 16:30:00
2 2018-05-01 16:30:00 2018-05-01 17:00:00
2 2018-05-01 17:00:00 2018-05-01 17:15:00
3 2018-05-01 17:15:00 2018-05-01 17:30:00
3 2018-05-01 17:30:00 2018-05-01 18:00:00
3 2018-05-01 18:00:00 2018-05-01 18:30:00
...
3 2018-05-02 22:30:00 2018-05-02 23:00:00
3 2018-05-02 23:00:00 2018-05-02 23:30:00
3 2018-05-02 23:30:00 2018-05-03 00:00:00
内联视图获取实数列以及开始的名义30分钟窗口-即,对于17:15,它获得17:00,如adj_start_time
。分层查询为此增加了30分钟的间隔,并且如果least
和greatest
不在半小时内,则使用它们来获取原始的开始/结束时间。
对于插入,您可以将原始ID替换为解析的row_number()
而不是使用序列,并添加val
:
insert into testt_output(id, start_time, end_time, val)
select row_number() over (order by id, level),
greatest(start_time,
adj_start_time + numtodsinterval(30 * (level - 1), 'MINUTE')) as start_time,
least(end_time,
adj_start_time + numtodsinterval(30 * level, 'MINUTE')) as end_time,
val
from (
select id,
start_time,
end_time,
val,
trunc(start_time, 'HH')
+ numtodsinterval(
case when extract(minute from cast(start_time as timestamp)) < 30 then 0
else 30
end, 'MINUTE') as adj_start_time
from testt
)
connect by level <= ceil((end_time - start_time - 1/86400) / (30/1440))
and prior id = id
and prior dbms_random.value is not null;
145 rows inserted.
select * from testt_output;
ID START_TIME END_TIME VAL
---------- ------------------- ------------------- ----------
1 2018-04-30 00:00:00 2018-04-30 00:30:00 423
2 2018-04-30 00:30:00 2018-04-30 01:00:00 423
...
47 2018-04-30 23:00:00 2018-04-30 23:30:00 423
48 2018-04-30 23:30:00 2018-05-01 00:00:00 423
49 2018-05-01 00:00:00 2018-05-01 00:30:00 455
50 2018-05-01 00:30:00 2018-05-01 01:00:00 455
...
82 2018-05-01 16:30:00 2018-05-01 17:00:00 455
83 2018-05-01 17:00:00 2018-05-01 17:15:00 455
84 2018-05-01 17:15:00 2018-05-01 17:30:00 455
85 2018-05-01 17:30:00 2018-05-01 18:00:00 455
...
144 2018-05-02 23:00:00 2018-05-02 23:30:00 455
145 2018-05-02 23:30:00 2018-05-03 00:00:00 455