Oracle SQL - 仅从组中选择少数元素

时间:2016-02-23 13:08:30

标签: sql oracle group-by

我有下表表示进程中的任务:

TASK_ID | PROCESS_ID | TASK_TYPE_ID
========+============+=============
1000    | 1          | A
1001    | 1          | B
1002    | 1          | C
1003    | 1          | D
1004    | 2          | A
1005    | 2          | C
1006    | 2          | D
1007    | 3          | A
1008    | 3          | C
1009    | 3          | D

我想隔离不同的流程类型。流程类型由唯一的任务序列定义。

以下查询

SELECT PROCESS_ID,
       COUNT(*) TASKS_NO,
       LISTAGG(TASK_TYPE_ID,'>') WITHIN GROUP (ORDER BY TASK_ID) TASK_SEQUENCE
FROM mytable
GROUP BY PROCESS_ID

可以隔离任务序列:

PROCESS_ID | TASKS_NO | TASK_SEQUENCE
===========+==========+==============
1          | 4        | A>B>C>D
2          | 3        | A>C>D
3          | 3        | A>C>D

现在我想聚合它来得到这个结果:

TASK_SEQUENCE | TASKS_NO | PROCESS_NO | PROC_REP_IDS
==============+==========+============+=============
A>B>C>D       | 4        | 1          | 1
A>C>D         | 3        | 2          | 2,3

PROCESS_NO列应该给出具有相同任务序列的进程数。另外,对于PROC_REP_IDS列中的每个不同任务序列(过程类型),应列出最大3(代表性)PROCESS_ID。在我的情况下,可能有数千个进程具有相同的任务序列,因此这里只应列出三个PROCESS_ID。

2 个答案:

答案 0 :(得分:1)

将您的查询用作子查询并重新聚合:

SELECT TASK_SEQUENCE, MAX(TASKS_NO) as TASKS_NO, SUM(TASKS_NO) as PROCESS_NO,
       LISTAGG(PROCESS_ID, ',') WITHIN GROUP (ORDER BY PROCESS_ID) as PROC_REP_IDS
FROM (SELECT p.*,
             ROW_NUMBER() OVER (PARTITION BY TASK_SEQUENCE ORDER BY PROCESS_ID) as seqnum
      FROM (SELECT PROCESS_ID,
                   COUNT(*) as TASKS_NO,
                   LISTAGG(TASK_TYPE_ID, '>') WITHIN GROUP (ORDER BY TASK_ID) as TASK_SEQUENCE
            FROM mytable
            GROUP BY PROCESS_ID
           ) p
     ) p
WHERE seqnum <= 3
GROUP BY TASK_SEQUENCE;

答案 1 :(得分:1)

使用FIRST功能以及将3个项放在第一个位置的排序标准,您应该能够达到结果。

请参阅此示例查询,其中我使用了WM_CONCAT,因为LISTAGG与FIRST不兼容。

with mytable as (
        select 1000 TASK_ID, 1 PROCESS_ID, 'A' as TASK_TYPE_ID
        from dual
        union all
        select 1001 TASK_ID, 1 PROCESS_ID, 'B' as TASK_TYPE_ID
        from dual
        union all
        select 1002 TASK_ID, 1 PROCESS_ID, 'C' as TASK_TYPE_ID
        from dual
        union all
        select 1003 TASK_ID, 1 PROCESS_ID, 'D' as TASK_TYPE_ID
        from dual
        union all
        select 1004 TASK_ID, 2 PROCESS_ID, 'A' as TASK_TYPE_ID
        from dual
        union all
        select 1005 TASK_ID, 2 PROCESS_ID, 'C' as TASK_TYPE_ID
        from dual
        union all
        select 1006 TASK_ID, 2 PROCESS_ID, 'D' as TASK_TYPE_ID
        from dual
        union all
        select 1007 TASK_ID, 3 PROCESS_ID, 'A' as TASK_TYPE_ID
        from dual
        union all
        select 1008 TASK_ID, 3 PROCESS_ID, 'C' as TASK_TYPE_ID
        from dual
        union all
        select 1009 TASK_ID, 3 PROCESS_ID, 'D' as TASK_TYPE_ID
        from dual

        union all
        select 1010 TASK_ID, 4 PROCESS_ID, 'A' as TASK_TYPE_ID
        from dual
        union all
        select 1011 TASK_ID, 4 PROCESS_ID, 'C' as TASK_TYPE_ID
        from dual
        union all
        select 1012 TASK_ID, 4 PROCESS_ID, 'D' as TASK_TYPE_ID
        from dual

        union all
        select 1013 TASK_ID, 5 PROCESS_ID, 'A' as TASK_TYPE_ID
        from dual
        union all
        select 1014 TASK_ID, 5 PROCESS_ID, 'C' as TASK_TYPE_ID
        from dual
        union all
        select 1015 TASK_ID, 5 PROCESS_ID, 'D' as TASK_TYPE_ID
        from dual


        union all
        select 1016 TASK_ID, 6 PROCESS_ID, 'A' as TASK_TYPE_ID
        from dual
        union all
        select 1017 TASK_ID, 6 PROCESS_ID, 'C' as TASK_TYPE_ID
        from dual
        union all
        select 1018 TASK_ID, 6 PROCESS_ID, 'D' as TASK_TYPE_ID
        from dual

    )
SELECT TASK_SEQUENCE, MAX(TASKS_NO) as TASKS_NO, COUNT(*) as PROCESS_NO,
    LISTAGG(PROCESS_ID, ',') WITHIN GROUP (ORDER BY PROCESS_ID) as PROC_REP_IDS,
    to_char(wm_concat(PROCESS_ID) keep (dense_rank first order by trunc((seqnum-1)/3))) as PROC_REP_IDS_limited
FROM (
        SELECT p.*,
            ROW_NUMBER() OVER (PARTITION BY TASK_SEQUENCE ORDER BY PROCESS_ID) as seqnum
        FROM (
                SELECT PROCESS_ID,
                    COUNT(*) TASKS_NO,
                    LISTAGG(TASK_TYPE_ID, '>') WITHIN GROUP (ORDER BY TASK_ID) as TASK_SEQUENCE
                FROM mytable
                GROUP BY PROCESS_ID
            ) p
    ) p
GROUP BY TASK_SEQUENCE