我需要对Oracle数据集进行后处理,以便找到热浪的数量。 根据定义,当数据值至少连续两次大于阈值时会发生热浪。
例如,给定阈值= 20且序列
23 31 32 17 16 23 16 21 22 18
热浪是2:
{23,31,32} and {21,22}
,最长的长度为3(较大子集的大小)
我的输入数据集由几个序列组成;样本输入结果集是:
-----------------------------
| ID | DAY | VALUE |
-----------------------------
| 100 | 1/1/17 | 20 |
| 100 | 2/1/17 | 21 |
| 200 | 1/1/17 | 12 |
| 200 | 2/1/17 | 24 |
| ... ... ...
换句话说,每个ID都有一个序列,我需要输出类似的东西:
-----------------------
| ID | #heat waves |
-----------------------
| 100 | 3 |
| 200 | 1 |
这是我的存储过程的当前版本:
create or replace PROCEDURE sp (
p_query IN VARCHAR2,
cursor_ out sys_refcursor
) AS
processed processed_data_table := processed_data_table();
c sys_refcursor;
BEGIN
OPEN c FOR p_query;
processed.EXTEND;
processed(processed.count) := processed_data_obj();
fetch c INTO processed(processed.count).ID,
processed(processed.count).DAY, processed(processed.count).VALUE;
while c%found
processed.EXTEND;
processed(processed.count) := processed_data_obj();
fetch c INTO processed(processed.count).ID,
processed(processed.count).DAY, processed(processed.count).VALUE;
END loop;
CLOSE c;
processed.TRIM;
// HERE I NEED TO PROCESS processed TABLE AND STORE RESULT IN output
TABLE
OPEN cursor_ FOR
SELECT *
FROM TABLE( output);
END sp;
任何人都可以帮我提供解决方案吗?
由于
答案 0 :(得分:5)
在Oracle 12c中,使用MATCH_RECOGNIZE
:
select id, count(*) "# of heatwaves" from series_data
match_recognize ( partition by id
order by day
one row per match
after match skip past last row
pattern ( over_threshold{2,} )
define
over_threshold as value > 20 )
group by id
要获得每个系列中最长的热浪,我们必须向MEASURES
引入MATCH_RECOGNIZE
子句,如下所示:
select id,
max(heatwave_length) "longest heatwave",
count(distinct heatwave_number) "# of heatwaves"
from series_data
match_recognize ( partition by id
order by day
measures
FINAL COUNT(*) as heatwave_length,
MATCH_NUMBER() heatwave_number
all rows per match
after match skip past last row
pattern ( over_threshold{2,} )
define
over_threshold as value > 20 )
group by id
order by id;
数据的完整示例:
with series_data ( id, day, value ) as
( SELECT 100, date '2017-01-01', 23 from dual union all
SELECT 100, date '2017-01-02', 31 from dual union all
SELECT 100, date '2017-01-03', 32 from dual union all
SELECT 100, date '2017-01-04', 44 from dual union all
SELECT 100, date '2017-01-05', 16 from dual union all
SELECT 100, date '2017-01-06', 23 from dual union all
SELECT 100, date '2017-01-07', 16 from dual union all
SELECT 100, date '2017-01-08', 21 from dual union all
SELECT 100, date '2017-01-09', 22 from dual union all
SELECT 100, date '2017-01-10', 18 from dual union all
SELECT 200, date '2017-01-01', 23 from dual union all
SELECT 200, date '2017-01-02', 31 from dual union all
SELECT 200, date '2017-01-03', 32 from dual union all
SELECT 200, date '2017-01-04', 17 from dual union all
SELECT 200, date '2017-01-05', 16 from dual union all
SELECT 200, date '2017-01-06', 23 from dual union all
SELECT 200, date '2017-01-07', 16 from dual union all
SELECT 200, date '2017-01-08', 21 from dual union all
SELECT 200, date '2017-01-09', 22 from dual union all
SELECT 200, date '2017-01-10', 22 from dual union all
SELECT 200, date '2017-01-11', 6 from dual union all
SELECT 200, date '2017-01-12', 22 from dual union all
SELECT 200, date '2017-01-13', 22 from dual )
select id,
max(heatwave_length) "longest heatwave",
count(distinct heatwave_number) "# of heatwaves"
from series_data
match_recognize ( partition by id
order by day
measures
FINAL COUNT(*) as heatwave_length,
MATCH_NUMBER() heatwave_number
all rows per match
after match skip past last row
pattern ( over_threshold{2,} )
define
over_threshold as value > 20 )
group by id
order by id;
结果:
ID longest heatwave # of heatwaves
----- -------------- --------------
100 4 2
200 3 3