这是参考此处描述的早期问题:Oracle SQL: How to get Random Records by each group
问题:
是否可以获得具有不同类别比率的随机样本。
Ex:如果我有132个样本的随机记录有3个类别(批准,拒绝,取消),我如何按照以下比率得到样本?
total sample = 132
category samples % sample Size
approved 50% 66
denied 30% 40
canceled 20% 26
注意:我需要原始数据,而不是计数
答案 0 :(得分:0)
让我们先获取一些样本数据。我使用批准的类别创建了132条记录,以获得包含66行的50%样本。
create table task as
select 'approved' category, rownum task_id from dual connect by level <= 132 union all
select 'denied' category, rownum task_id from dual connect by level <= 134 union all
select 'canceled' category, rownum task_id from dual connect by level <= 130
;
关键步骤是为每个类别定义列RAND_PERC
,其值介于0和1之间。
如果你想要50%的样本选择一个值小于或等于的类别中的所有行.5
首先通过以随机顺序(每个类别独立)分配行号来划分列,并将其除以 每个类别中的行数。
select CATEGORY, TASK_ID,
( row_number() over (partition by task.category order by dbms_random.value)) /
( count(*) over (partition by task.category)) as rand_perc
from task
order by 1,3;
CATEGORY TASK_ID RAND_PERC
-------- ---------- ----------
approved 56 ,00757575758
approved 129 ,0151515152
approved 61 ,0227272727
要绘制样本,请根据需要定义WHERE条件 - 请参阅下面的示例。
with rnd as (
select CATEGORY, TASK_ID,
( row_number() over (partition by task.category order by dbms_random.value)) /
( count(*) over (partition by task.category)) as rand_perc
from task
)
select CATEGORY, count(*) cnt
from rnd
where
category = 'approved' and rand_perc <= .5 or /* take 50% from active */
category = 'denied' and rand_perc <= .3 or
category = 'canceled' and rand_perc <= .2
group by CATEGORY
;
根据需要提供样本量
CATEGORY CNT
-------- ----------
canceled 26
denied 40
approved 66