在Percentile范围内对数据进行分区,并为不同的Range指定不同的值

时间:2013-05-23 14:43:43

标签: mysql

我有表结构,如下所示

Temp
 Customer_id | sum

现在我必须使用额外的列customer_type创建视图,并且如果客户位于前10%的客户(按降序的降序,客户总数可能不同),则分配值1,如果客户介于10%-20之间则指定2如果客户介于20%-60%之间,则为3%,如果客户介于60%-100%之间,则为4%。我怎么能这样做?

我只能提取前10%和10% - 20%的数据,但无法将值分配为(source

SELECT * FROM temp WHERE sum >= (SELECT sum FROM temp t1 
WHERE(SELECT count(*) FROM temp t2 WHERE t2.sum >= t1.sum) <= 
(SELECT 0.1 * count(*) FROM temp));

和(不高效只是增强上面的代码)

select * from temp t1 
where (select count(*) from temp t2 where t2.sum>=t2.sum)
>= (select 0.1 * count(*) from temp) and (select count(*) from temp t2 where t2.sum>=t1.sum)
<= (select 0.2 * count(*) from temp);

示例数据位于sqlfiddle.com

2 个答案:

答案 0 :(得分:2)

这应该对你有所帮助。您需要获取总数和总行数的行号。我相信你可以很容易地找出其余部分。

SELECT  
    *,
    @curRow := @curRow + 1 AS row_number,
    (@curRow2 := @curRow2 + 1) / c as pct_row
FROM    
    temp t
    JOIN (SELECT @curRow := 0) r
    JOIN (SELECT @curRow2 := 0) r2
    join (select count(*) c from temp) s
order by 
    sum desc

这是基于此answer

答案 1 :(得分:2)

我这样解决了这个问题。感谢@twn08给出了我的回答。

select customer_id,sum,case 
when pct_row<=0.10 then 1
when pct_row>0.10 and pct_row<=0.20 then 2
when pct_row>0.20 and pct_row<=0.60 then 3
when pct_row>0.60 then 4
end as customer_label from (
select customer_id,sum,(@curRow := @curRow+1)/c as pct_row
from temp t 
jOIN (SELECT @curRow := 0) r
JOIN (SELECT @curRow2 := 0) r2 
join (select count(*) c from temp) s
order by sum desc) p;

我不知道这是否是有效的方法,但适用于小数据集。