我已经生成了以下结果集
"degree_easy","degree_hard","easy_percent","hard_percent"
1,5,0.166667,0.833333
1,5,0.166667,0.833333
1,6,0.142857,0.857143
1,8,0.111111,0.888889
以上结果集是根据以下查询生成的:
select * from (
select degree_one as degree_easy,
(degree_two + degree_three) as degree_hard,
(degree_one::real/(degree_one::real + degree_two::real + degree_three::real))
as easy_percent,
((degree_two::real + degree_three::real)/(degree_one::real + degree_two::real +
degree_three::real)) as hard_percent FROM recommendation_degree
) as dc
where dc.degree_easy >= 1 and dc.degree_hard >= 1
order by dc.easy_percent ASC, dc.hard_percent ASC
现在我要做的是计算百分位数:
我不确定上面哪一列使用更有意义,但假设我想使用degree_easy和degree_hard计算百分位数或至少其中一个如何在postgres中使用ntile
函数来执行此操作?
执行以下操作的最佳做法是什么:
percentile, number_of_users
25, 4
50, 10
75, 20
99, 20
答案 0 :(得分:3)
ntile
可以判断您是否位于有序列表的最低25%。但它不支持权重。要使ntile
工作,所有群组的大小必须相等。
您可以使用sum ... over
分析函数计算权重。运行总和(所有行的总和等于或低于当前行的值)是:
sum(col1) over (order by col1)
整个表格的总和是:
sum(col1) over ()
您可以通过比较运行总和与总和来计算百分位数。一个简化的例子:
create table people (id serial, points int);
-- 3 people with 1 point, 2 people with 2 points, 1 person with 3 points
-- total 6 people and 10 points
insert into people (points) values (1), (1), (1), (2), (2), (3);
select *
, case
when sum(points) over (order by points) > 0.75 * sum(points) over () then '100%'
when sum(points) over (order by points) > 0.5 * sum(points) over () then '75%'
when sum(points) over (order by points) > 0.25 * sum(points) over () then '50%'
else '25%'
end as Percentile
from people
打印哪些:
ID POINTS PERCENTILE
1 1 50%
2 1 50%
3 1 50%
4 2 75%
5 2 75%
6 3 100%
1分的人一共得3分,占总数的30%。这使他们处于50%的百分位数。 2分的人总数达到7分,排在前75%。有3分的人将总数提高到10分,将他置于最高位置。