如何根据列值选择不同的数据百分比?

时间:2013-08-20 00:45:11

标签: sql postgresql

我需要查询具有“性别”列的表,如下所示:

| id | gender | name    |
-------------------------
| 1  | M      | Michael |
-------------------------
| 2  | F      | Hanna   |
-------------------------
| 3  | M      | Louie   |
-------------------------

我需要提取前N个结果,例如80%的男性和20%的女性。所以,如果我需要1000个结果,我想要找回800名男性和200名女性。

  1. 是否可以在单个查询中执行此操作?怎么样?

  2. 如果我没有足够的记录(想象一下上面的例子我只有700名男性)可以自动选择700/300吗?

3 个答案:

答案 0 :(得分:2)

基本上,你想尽可能多地获得'M',但不要超过你的百分比,然后得到足够的'F',这样你总共有1000行:

with cte_m as (
    select * from Table1 where gender = 'M' limit (1000 * 0.8)
), cte as (
    select *, 0 as ord from cte_m
    union all
    select *, 1 as ord from Table1 where gender = 'F'
    order by ord
    limit 1000
)
select id, gender, name
from cte

sql fiddle demo

答案 1 :(得分:0)

以下内容如何假设您提供行计数(“lmt”),并为M / F分配浮动:

create table gen (
id     integer,
gender text,
name   text
);

-- inserts 75% males and 25% females into the source table ("gen")
insert into gen select n, case when mod(n,5) = 0 then 'F' else 'M' end, (case when mod(n,5) = 0 then 'F' else 'M' end)||'_'||n::text
from generate_series(1,20000) n


-- extract 80/20 M vs F
with conf as (select 1000 as lmt, .80::FLOAT as mpct, .20::FLOAT as fpct),
     g as (select id,gender,name,row_number() over (partition by gender order by gender) rn from gen)
select *
from g
where (gender = 'M' and rn <= (select lmt*mpct from conf))
or (gender = 'F' and rn <= (select lmt*fpct from conf));


-- Same query, to show the percent M vs F:
with conf as (select 1000 as lmt, .80::FLOAT as mpct, .20::FLOAT as fpct),
     g as (select id,gender,name,row_number() over (partition by gender order by gender) rn from gen)
select gender,count(*)
from (
    select *
    from g
    where (gender = 'M' and rn <= (select lmt*mpct from conf))
    or (gender = 'F' and rn <= (select lmt*fpct from conf))
    ) y
group by gender

答案 2 :(得分:-1)

我没有postgresql,但第一个场景很简单,在MS SQL 2012中使用了一个联盟。我假设你可以在postgre中类似地做到这一点:

declare @MaxRows            INT
        ,@PercentageMale    INT
        ,@PercentageFemale  INT

select      @MaxRows = 1000
            ,@PercentageMale = 80
            ,@PercentageFemale = 20

select  top (@MaxRows*@PercentageMale/100)  *
FROM        someTable
WHERE       Gender = 'M'
UNION
select  top (@MaxRows*@PercentageFemale/100)    *
FROM        someTable
WHERE       Gender = 'F'

第二位实际上非常简单。基本上你想要选择男性的最高百分比,然后用女性填充列表的其余部分,直到总行数。女性的数量实际上并不相关:

declare @MaxRows            INT
        ,@PercentageMale    INT

select      @MaxRows = 1000
            ,@PercentageMale = 80

SELECT TOP @MaxRows *
FROM
(
    select  top (@MaxRows*@PercentageMale/100)  *
    FROM        someTable
    WHERE       Gender = 'M'
    UNION
    select  top (@MaxRows)  * --we never want more than @MaxRows 
                              --so no need to check for a %, 
                              --just fill in the rest of the data set
    FROM        someTable
    WHERE       Gender = 'F'
) a