在Oracle中基于RowCount对结果进行分组

时间:2014-10-05 06:32:45

标签: sql oracle plsql

我有一个要求,我需要根据rowCount输出组结果。

以下是我从SQL获得的结果集:

ID      Date        Count
1     10/01/2013    50
1     10/02/2013    25
1     10/03/2013    100
1     10/04/2013    200 
1     10/05/2013    175
1     10/06/2013     45
2     10/01/2013     85
2     10/02/2013    100 

我可以将它们作为

    id        date    Count
    1     10/03/2013    175
    1     10/04/2013    200
    1     10/05/2013    175
    1     10/06/2013     45
    2     10/02/2013    185

我需要通过将每个 ID 的计数< = 200 分组来减少结果集。例如,10 / 01,10 / 02和10/03的总和达到175,所以我需要将它们分成一行。如果添加值fir 10/05和10/06将是> 200,那么将它们保持未分组。

Oracle 11g中是否可以使用PLSQL或SQL Analytic函数解决此问题?

请求新REsult 有没有办法将带有附加列的结果返回给它? StartD列对于每一行,它必须采用该

的上一个结束日期
ID      StartD      EndDate     Count
1       10/01/2013  10/03/2013   175
1       10/03/2013  10/04/2013   200
1       10/04/2013  10/05/2013   250
1       10/05/2013  10/06/2013   190
1       10/06/2013  10/08/2013    45
2       10/01/2013  10/01/2013   185

3 个答案:

答案 0 :(得分:3)

您可以使用MATCH_RECOGNIZE模式匹配技术在Oracle 12c中执行此操作。

设置(添加了几行,包括一些计数大于200的行,用于测试):

create table stuff (id int, stamp date, num int);
insert into stuff values (1, to_date('10/01/2013', 'MM/DD/RRRR'), 50);
insert into stuff values (1, to_date('10/02/2013', 'MM/DD/RRRR'), 25);
insert into stuff values (1, to_date('10/03/2013', 'MM/DD/RRRR'), 100);
insert into stuff values (1, to_date('10/04/2013', 'MM/DD/RRRR'), 200);
insert into stuff values (1, to_date('10/05/2013', 'MM/DD/RRRR'), 250);
insert into stuff values (1, to_date('10/06/2013', 'MM/DD/RRRR'), 175);
insert into stuff values (1, to_date('10/07/2013', 'MM/DD/RRRR'), 15);
insert into stuff values (1, to_date('10/08/2013', 'MM/DD/RRRR'), 45);
insert into stuff values (2, to_date('10/01/2013', 'MM/DD/RRRR'), 85);
insert into stuff values (2, to_date('10/02/2013', 'MM/DD/RRRR'), 100);
commit;

查询将是:

select id, first_stamp, last_stamp, partial_sum
from stuff
match_recognize (
    partition by id order by stamp
    measures
      first(a.stamp) as first_stamp
    , last(a.stamp)  as last_stamp
    , sum(a.num)     as partial_sum
    pattern (A+)
    define A as (sum(a.num) <= 200 or (count(*) = 1 and a.num > 200))
);

给出了:

        ID FIRST_STAMP LAST_STAMP PARTIAL_SUM
---------- ----------- ---------- -----------
         1 01-OCT-13   03-OCT-13          175 
         1 04-OCT-13   04-OCT-13          200 
         1 05-OCT-13   05-OCT-13          250 
         1 06-OCT-13   07-OCT-13          190 
         1 08-OCT-13   08-OCT-13           45 
         2 01-OCT-13   02-OCT-13          185 

 6 rows selected 

这是如何运作的:

  • 模式匹配在整个表格上完成,由id分区并按时间戳排序。
  • 模式A+表示我们需要连续的组(根据分区和order by子句)满足条件A的行。
  • 条件A是集合满足:
    • 集合中的num的总和为200或更少
    • 或者该集合具有num大于200的单行(否则这些行永远不会匹配,并且不会输出)。
  • measures子句指示匹配返回的内容(在分区键的顶部):
    • 每个小组的第一个和最后一个时间戳
    • 每组数量的总和

这是一种具有表值函数的方法,该函数应该在11g(我认为10g)中工作。相当不优雅,但做的工作。按顺序遍历表格,只要它们“满”就输出组。

您也可以为组大小添加参数。

create or replace 
type my_row is object (id int, stamp date, num int);

create or replace 
type my_tab as table of my_row;

create or replace
  function custom_stuff_groups
    return my_tab pipelined
  as
    cur_sum number;
    cur_id  number;
    cur_dt  date;
  begin
    cur_sum := null;
    cur_id  := null;
    cur_dt  := null;
    for x in (select id, stamp, num from stuff order by id, stamp)
    loop
      if (cur_sum is null) then
        -- very first row
        cur_id      := x.id;
        cur_sum     := x.num;
      elsif (cur_id != x.id) then
        -- changed ID, so output last line for previous id and reset
        pipe row(my_row(cur_id, cur_dt, cur_sum));
        cur_id              := x.id;
        cur_sum             := x.num;
      elsif (cur_sum + x.num > 200) then
        -- same id, sum overflows.
        pipe row(my_row(cur_id, cur_dt, cur_sum));
        cur_sum := x.num;
      else
        -- same id, sum still below 200
        cur_sum := cur_sum + x.num;
      end if;
      cur_dt := x.stamp;
    end loop;
    if (cur_sum is not null) then
      -- output the last line, if any
      pipe row(my_row(cur_id, cur_dt, cur_sum));
    end if;
  end;

用作:

select * from table(custom_stuff_groups());

答案 1 :(得分:2)

这将根据您的示例数据返回预期结果。我不是百分百肯定,但它是否适用于所有情况(并且它可能不会非常有效):

with summed_values as (
  select stuff.*,
         case 
             when sum(cnt) over (partition by id order by count_date) >= 200 then 1
             else 0
         end as sum_group
  from stuff
), totals as (
  select id,
         max(count_date) as last_count,
         sum(cnt) as total_count
  from summed_values
  where sum_group = 0
  group by id
  union all
  select id,
         count_date as last_count,
         sum(cnt) as total_count
  from summed_values
  where sum_group = 1
  group by id, count_date
)
select *
from totals
order by id, last_count
;

SQLFiddle示例:http://sqlfiddle.com/#!4/4e0d8/1

答案 2 :(得分:1)

对于此类任务,您可以使用pipelined table function生成所需的结果。

有一点&#34;管道&#34;因为它需要定义一些其他类型,但函数本身是一个简单的游标循环,累积值并在id更改时生成行,或者当累计总数超过限制时生成行。

你可以用很多方法实现。在这里,使用普通的旧循环,而不是 for in cursor ,我获得的东西不是那么不优雅

CREATE OR REPLACE TYPE stuff_row AS OBJECT (
  id          int,
  stamp       date,
  last_stamp  date,
  num         int
);
CREATE OR REPLACE TYPE stuff_tbl AS TABLE OF stuff_row;
CREATE OR REPLACE FUNCTION partition_by_200
RETURN stuff_tbl PIPELINED
AS
  CURSOR data IS SELECT id, stamp, num FROM stuff ORDER BY id, stamp;
  curr data%ROWTYPE;
  acc  stuff_row := stuff_row(NULL,NULL,NULL,NULL);
BEGIN
  OPEN data;
  FETCH data INTO acc.id,acc.stamp,acc.num;
  acc.last_stamp := acc.stamp;


  IF data%FOUND THEN
  LOOP
    FETCH data INTO curr;

    IF data%NOTFOUND OR curr.id <> acc.id OR acc.num+curr.num > 200
    THEN
      PIPE ROW(stuff_row(acc.id,acc.stamp,acc.last_stamp,acc.num));
      EXIT WHEN data%NOTFOUND;

      -- reset the accumulator
      acc := stuff_row(curr.id, curr.stamp, curr.stamp, curr.num);
    ELSE
      -- accumulate value
      acc.num := acc.num + curr.num;
      acc.last_stamp := curr.stamp;
    END IF;
  END LOOP;
  END IF;

  CLOSE data;
END;

用法:

SELECT * FROM TABLE(partition_by_200());

own answer中使用与Mat相同的测试数据,这会产生:

ID  STAMP       LAST_STAMP  NUM
1   10/01/2013  10/03/2013  175
1   10/04/2013  10/04/2013  200
1   10/05/2013  10/05/2013  250
1   10/06/2013  10/07/2013  190
1   10/08/2013  10/08/2013  45
2   10/01/2013  10/02/2013  185