Oracle SQL:如何使用预定义的贡献获得每个组的随机记录

时间:2016-04-25 13:58:24

标签: sql oracle

这是参考此处描述的早期问题:Oracle SQL: How to get Random Records by each group

问题:

是否可以获得具有不同类别比率的随机样本。

Ex:如果我有132个样本的随机记录有3个类别(批准,拒绝,取消),我如何按照以下比率得到样本?

total sample = 132

category     samples %  sample Size
approved     50%        66
denied       30%        40
canceled     20%        26

注意:我需要原始数据,而不是计数

1 个答案:

答案 0 :(得分:0)

让我们先获取一些样本数据。我使用批准的类别创建了132条记录,以获得包含66行的50%样本。

create table task as
select 'approved' category, rownum task_id from dual connect by level <= 132 union all
select 'denied' category, rownum task_id from dual connect by level <= 134 union all
select 'canceled' category, rownum task_id from dual connect by level <= 130 
;

关键步骤是为每个类别定义列RAND_PERC,其值介于0和1之间。 如果你想要50%的样本选择一个值小于或等于的类别中的所有行.5

首先通过以随机顺序(每个类别独立)分配行号来划分列,并将其除以 每个类别中的行数。

select CATEGORY, TASK_ID, 
 ( row_number() over (partition by task.category order by dbms_random.value)) / 
 ( count(*) over (partition by task.category)) as rand_perc
from task
order by 1,3;

CATEGORY    TASK_ID  RAND_PERC
-------- ---------- ----------
approved         56 ,00757575758 
approved        129 ,0151515152 
approved         61 ,0227272727 

要绘制样本,请根据需要定义WHERE条件 - 请参阅下面的示例。

with rnd as (
select CATEGORY, TASK_ID, 
 ( row_number() over (partition by task.category order by dbms_random.value)) / 
 ( count(*) over (partition by task.category)) as rand_perc
from task
)
select CATEGORY, count(*) cnt
from rnd
where 
category = 'approved' and rand_perc <= .5  or /* take 50% from active */
category = 'denied' and rand_perc <= .3  or
category = 'canceled' and rand_perc <= .2
group by CATEGORY
;

根据需要提供样本量

CATEGORY        CNT
-------- ----------
canceled         26 
denied           40 
approved         66