Oracle ListaGG,前3个最常见的值,在一列中给出,按ID分组

时间:2016-06-30 15:48:04

标签: sql oracle rank listagg

我有一个关于SQL查询的问题,可以在" plain" SQL,但我确信我需要使用一些组连接(不能使用MySQL)所以第二个选项是ORACLE方言,因为会有Oracle数据库。我们假设我们有以下实体:

表:兽医访问​​

Visit_Id, 
Animal_id, 
Veterinarian_id, 
Sickness_code

让我们说有100次访问(100次visit_id),每次animal_id访问次数约为20次。

我需要创建一个SELECT,按Animal_id分组,有3列

  • animal_id
  • 秒显示此特定动物的流感访问总量(让我们说流感,sickness_code = 5)
  • 第3栏显示每只动物的前三个疾病代码(前3个最常用于此特定animal_id的代码)

怎么做?第一列和第二列很容易,但第三列?我知道我需要使用Oracle的LISTAGG,OVER PARTITION BY,COUNT和RANK,我试图把它绑在一起,但没有像我预期的那样解决:(这个查询应该怎么样?

2 个答案:

答案 0 :(得分:1)

我认为最自然的方式是使用两个级别的聚合,以及一些窗口函数:

select vas.animal,
       sum(case when sickness_code = 5 then cnt else 0 end) as numflu,
       listagg(case when seqnum <= 3 then sickness_code end, ',') within group (order by seqnum) as top3sicknesses
from (select animal, sickness_code, count(*) as cnt,
             row_number() over (partition by animal order by count(*) desc) as seqnum
      from visits
      group by animal, sickness_code
     ) vas
group by vas.animal;

这使用listagg()忽略NULL值的事实。

答案 1 :(得分:1)

此处示例数据

create table VET as
select 
rownum+1 Visit_Id, 
mod(rownum+1,5) Animal_id, 
cast(NULL as number)  Veterinarian_id, 
trunc(10*dbms_random.value)+1 Sickness_code
from dual
connect by level <=100;

查询

基本上子查询执行以下操作:

总计数和计算流感计数(在动物的所有记录中)

计算RANK(如果你真的只需要3条记录,请使用ROW_NUMBER - 见下面的讨论)

过滤前三名RANK

LISTAGGregate结果

with agg as (
select Animal_id, Sickness_code, count(*) cnt,
sum(case when SICKNESS_CODE = 5 then 1 else 0 end) over (partition by animal_id) as cnt_flu
from vet
group by Animal_id, Sickness_code
), agg2 as (
select ANIMAL_ID, SICKNESS_CODE, CNT, cnt_flu,
rank() OVER (PARTITION BY ANIMAL_ID ORDER BY cnt DESC) rnk
from agg
), agg3 as (
select ANIMAL_ID, SICKNESS_CODE, CNT, CNT_FLU, RNK
from agg2
where rnk <= 3
)
select 
ANIMAL_ID, max(CNT_FLU) CNT_FLU,
LISTAGG(SICKNESS_CODE||'('||CNT||')', ', ') WITHIN GROUP (ORDER BY rnk)  as   cnt_lts
from agg3
group by ANIMAL_ID 
order by 1;

给出

 ANIMAL_ID    CNT_FLU CNT_LTS                                     
---------- ---------- ---------------------------------------------
         0          1 6(5), 1(4), 9(3)                              
         1          1 1(5), 3(4), 2(3), 8(3)                        
         2          0 1(5), 10(3), 4(3), 6(3), 7(3)                 
         3          1 5(4), 2(3), 4(3), 7(3)                        
         4          1 2(5), 10(4), 1(2), 3(2), 5(2), 7(2), 8(2) 

我故意向Sickness_code(计数访问)展示前3名可以有你应该处理的关系。 检查RANK功能。在这种情况下,使用ROW_NUMBER不是确定性的。