这里我描述一个抽象的案例,但它类似于我现在试图解决的案例。我知道如何使用PL / SQL块获得粗略结果,但我想知道是否有人可以使用单个选择查询来建议解决方案。
假设我们有一个表t_people
,其中有数千条记录描述了一组具有以下属性集的人:
id
age
,号码height
in cm,number gender
,varchar2('male'或'female')我们需要提取N条记录,以便结果集符合以下条件:
我们也可以假设N远小于表中的总行数,问题是可以解决的。
您如何建议使用单个选择查询执行此操作?
由于
答案 0 :(得分:3)
您可以将数据分为8组,然后从每组中取出比例样本以满足您的要求。一种粗略的方法是将条件转换为组,例如:
然后你可以解决这个问题:
with p as (
select p.*,
row_number() over (partition by height, male, age order by height) as seqnum
from (select p.*,
(case when height > 180 then 1 else 0 end) as height,
(case when gender = 'male' then 1 else 0 end) as male,
(case when age > 40 then 1 else 0 end) as age
from people p
) p
)
select p.*
from p
where (height = 1 and male = 0 and age = 0 and seqnum <= 300) or
(height = 0 and male = 0 and age = 0 and seqnum <= 100) or
(height = 0 and male = 1 and age = 1 and seqnum <= 400) or
(height = 0 and male = 1 and age = 0 and seqnum <= 200);
您可以使用另一种方法,均匀地填充8个桶,跟踪每个维度的数字(年龄/年龄,男/女,更短/更高)。然后在填充第一个维度时停止填充并继续填充4个互补单元格。重复此过程,直到获得所需的数字。
答案 1 :(得分:0)
我最终选择suggested的第一种方法Gordon Linoff并做了一些小修改。我保留了最初的想法,但还引入了几个额外的子查询,以指定组内记录的所需分布,并构建一个矩阵,每个组具有所需的记录计数。还有全局参数段,其中包含指定总记录数的唯一参数。
查询产生非常有用的结果:
with
people as (
select id,
floor(months_between(sysdate, date_birth)/12) age,
195 - least(floor(months_between(sysdate, date_birth)/12), 50) height,
decode(sex, 1, 'male', 'female') gender
from my_people_table
where date_birth is not null and rownum < 100000
),
params as ( /* Global params */
select 100 rec_count -- total record count
from dual
),
age_groups as ( /* distribution by height */
select 'group 1' age_group, .7 prc from dual union
select 'group 2' age_group, .3 prc from dual
),
height_groups as ( /* distribution by height */
select 'group 1' height_group, .6 prc from dual union
select 'group 2' height_group, .4 prc from dual
),
genders as ( /* distribution by gender */
select 'male' gender, .6 prc from dual union
select 'female' gender, .4 prc from dual
),
mx as ( /* a matrix with record counts per group */
select age_group, height_group, gender,
ceil(
age_groups.prc *
height_groups.prc *
genders.prc *
rec_count
) rec_count
from age_groups, height_groups, genders, params
),
xpeople as ( /* Minor transformations - groups and group counters */
select p.*,
row_number() over (
partition by age_group, height_group, gender
order by age_group, height_group, gender
) rec_num
from (
select people.*,
case
when age <= 40 then 'group 1'
else 'group 2'
end age_group,
case
when height <= 180 then 'group 1'
else 'group 2'
end height_group
from people
) p
)
/* the resulting query uses the matrix to filter the records */
select xpeople.*
from xpeople join mx
on xpeople.age_group = mx.age_group
and xpeople.height_group = mx.height_group
and xpeople.gender = mx.gender
and xpeople.rec_num <= mx.rec_count
感谢您的帮助!