我需要查询具有“性别”列的表,如下所示:
| id | gender | name | ------------------------- | 1 | M | Michael | ------------------------- | 2 | F | Hanna | ------------------------- | 3 | M | Louie | -------------------------
我需要提取前N个结果,例如80%的男性和20%的女性。所以,如果我需要1000个结果,我想要找回800名男性和200名女性。
是否可以在单个查询中执行此操作?怎么样?
如果我没有足够的记录(想象一下上面的例子我只有700名男性)可以自动选择700/300吗?
答案 0 :(得分:2)
基本上,你想尽可能多地获得'M',但不要超过你的百分比,然后得到足够的'F',这样你总共有1000行:
with cte_m as (
select * from Table1 where gender = 'M' limit (1000 * 0.8)
), cte as (
select *, 0 as ord from cte_m
union all
select *, 1 as ord from Table1 where gender = 'F'
order by ord
limit 1000
)
select id, gender, name
from cte
答案 1 :(得分:0)
以下内容如何假设您提供行计数(“lmt”),并为M / F分配浮动:
create table gen (
id integer,
gender text,
name text
);
-- inserts 75% males and 25% females into the source table ("gen")
insert into gen select n, case when mod(n,5) = 0 then 'F' else 'M' end, (case when mod(n,5) = 0 then 'F' else 'M' end)||'_'||n::text
from generate_series(1,20000) n
-- extract 80/20 M vs F
with conf as (select 1000 as lmt, .80::FLOAT as mpct, .20::FLOAT as fpct),
g as (select id,gender,name,row_number() over (partition by gender order by gender) rn from gen)
select *
from g
where (gender = 'M' and rn <= (select lmt*mpct from conf))
or (gender = 'F' and rn <= (select lmt*fpct from conf));
-- Same query, to show the percent M vs F:
with conf as (select 1000 as lmt, .80::FLOAT as mpct, .20::FLOAT as fpct),
g as (select id,gender,name,row_number() over (partition by gender order by gender) rn from gen)
select gender,count(*)
from (
select *
from g
where (gender = 'M' and rn <= (select lmt*mpct from conf))
or (gender = 'F' and rn <= (select lmt*fpct from conf))
) y
group by gender
答案 2 :(得分:-1)
我没有postgresql,但第一个场景很简单,在MS SQL 2012中使用了一个联盟。我假设你可以在postgre中类似地做到这一点:
declare @MaxRows INT
,@PercentageMale INT
,@PercentageFemale INT
select @MaxRows = 1000
,@PercentageMale = 80
,@PercentageFemale = 20
select top (@MaxRows*@PercentageMale/100) *
FROM someTable
WHERE Gender = 'M'
UNION
select top (@MaxRows*@PercentageFemale/100) *
FROM someTable
WHERE Gender = 'F'
第二位实际上非常简单。基本上你想要选择男性的最高百分比,然后用女性填充列表的其余部分,直到总行数。女性的数量实际上并不相关:
declare @MaxRows INT
,@PercentageMale INT
select @MaxRows = 1000
,@PercentageMale = 80
SELECT TOP @MaxRows *
FROM
(
select top (@MaxRows*@PercentageMale/100) *
FROM someTable
WHERE Gender = 'M'
UNION
select top (@MaxRows) * --we never want more than @MaxRows
--so no need to check for a %,
--just fill in the rest of the data set
FROM someTable
WHERE Gender = 'F'
) a