Oracle存储过程 - 热浪数(一系列模式匹配数)

时间:2017-07-21 13:39:36

标签: oracle

我需要对Oracle数据集进行后处理,以便找到热浪的数量。 根据定义,当数据值至少连续两次大于阈值时会发生热浪。

例如,给定阈值= 20且序列

23 31 32 17 16 23 16 21 22 18

热浪是2:

{23,31,32} and {21,22}

,最长的长度为3(较大子集的大小)

我的输入数据集由几个序列组成;样本输入结果集是:

 -----------------------------
|  ID   |    DAY   |   VALUE |
 -----------------------------
|   100 |   1/1/17 |    20   |
|   100 |   2/1/17 |    21   |
|   200 |   1/1/17 |    12   | 
|   200 |   2/1/17 |    24   |
|   ...     ...        ...

换句话说,每个ID都有一个序列,我需要输出类似的东西:

-----------------------
|  ID   |  #heat waves |
 -----------------------
|   100 |      3       | 
|   200 |      1       |

这是我的存储过程的当前版本:

create or replace PROCEDURE sp (
p_query  IN VARCHAR2,
cursor_  out sys_refcursor
) AS
    processed     processed_data_table := processed_data_table();
    c sys_refcursor;
BEGIN
    OPEN c FOR p_query;
    processed.EXTEND;
    processed(processed.count) := processed_data_obj();
    fetch c INTO processed(processed.count).ID, 
    processed(processed.count).DAY, processed(processed.count).VALUE;
    while c%found
        processed.EXTEND;
        processed(processed.count) := processed_data_obj();
        fetch c INTO processed(processed.count).ID, 
        processed(processed.count).DAY, processed(processed.count).VALUE;

    END loop;
    CLOSE c;
    processed.TRIM;

    // HERE I NEED TO PROCESS processed TABLE AND STORE RESULT IN output 
    TABLE

    OPEN cursor_ FOR
    SELECT *
    FROM   TABLE( output);
END sp; 

任何人都可以帮我提供解决方案吗?

由于

1 个答案:

答案 0 :(得分:5)

在Oracle 12c中,使用MATCH_RECOGNIZE

select id, count(*) "# of heatwaves" from series_data
match_recognize ( partition by id
                  order by day
                  one row per match
                  after match skip past last row
                  pattern ( over_threshold{2,} )
                  define 
                    over_threshold as value > 20 )
group by id

更新:还显示每个系列的最长热波

要获得每个系列中最长的热浪,我们必须向MEASURES引入MATCH_RECOGNIZE子句,如下所示:

select id, 
       max(heatwave_length) "longest heatwave", 
       count(distinct heatwave_number) "# of heatwaves" 
from series_data 
match_recognize ( partition by id
                  order by day
                  measures
                    FINAL COUNT(*) as heatwave_length,
                    MATCH_NUMBER() heatwave_number
                  all rows per match
                  after match skip past last row
                  pattern ( over_threshold{2,} )
                  define 
                    over_threshold as value > 20 ) 
group by id 
order by id;

数据的完整示例:

with series_data ( id, day, value ) as 
( SELECT 100, date '2017-01-01', 23 from dual union all
  SELECT 100, date '2017-01-02', 31 from dual union all
  SELECT 100, date '2017-01-03', 32 from dual union all
  SELECT 100, date '2017-01-04', 44 from dual union all
  SELECT 100, date '2017-01-05', 16 from dual union all
  SELECT 100, date '2017-01-06', 23 from dual union all
  SELECT 100, date '2017-01-07', 16 from dual union all
  SELECT 100, date '2017-01-08', 21 from dual union all
  SELECT 100, date '2017-01-09', 22 from dual union all
  SELECT 100, date '2017-01-10', 18 from dual union all
  SELECT 200, date '2017-01-01', 23 from dual union all
  SELECT 200, date '2017-01-02', 31 from dual union all
  SELECT 200, date '2017-01-03', 32 from dual union all
  SELECT 200, date '2017-01-04', 17 from dual union all
  SELECT 200, date '2017-01-05', 16 from dual union all
  SELECT 200, date '2017-01-06', 23 from dual union all
  SELECT 200, date '2017-01-07', 16 from dual union all
  SELECT 200, date '2017-01-08', 21 from dual union all
  SELECT 200, date '2017-01-09', 22 from dual union all
  SELECT 200, date '2017-01-10', 22 from dual union all
  SELECT 200, date '2017-01-11', 6 from dual union all
  SELECT 200, date '2017-01-12', 22 from dual union all
  SELECT 200, date '2017-01-13', 22 from dual )
select id, 
       max(heatwave_length) "longest heatwave", 
       count(distinct heatwave_number) "# of heatwaves" 
from series_data 
match_recognize ( partition by id
                  order by day
                  measures
                    FINAL COUNT(*) as heatwave_length,
                    MATCH_NUMBER() heatwave_number
                  all rows per match
                  after match skip past last row
                  pattern ( over_threshold{2,} )
                  define 
                    over_threshold as value > 20 ) 
group by id 
order by id;

结果:

ID       longest heatwave   # of heatwaves
-----    --------------     --------------
100      4                  2
200      3                  3