在MSSQL 2005中计算百分位数排名(例如第90个百分位数或中位数分数)的最佳方法是什么?
我希望能够为单个分数列选择第25个,中位数和第75个百分位数(最好是在单个记录中,这样我可以与平均值,最大值和最小值组合)。例如,结果的表输出可能是:
Group MinScore MaxScore AvgScore pct25 median pct75
----- -------- -------- -------- ----- ------ -----
T1 52 96 74 68 76 84
T2 48 98 74 68 75 85
答案 0 :(得分:15)
我认为这是最简单的解决方案:
SELECT TOP N PERCENT FROM TheTable ORDER BY TheScore DESC
其中N =(100 - 期望百分位数)。因此,如果您希望所有行都在第90个百分点,那么您将选择前10%。
我不确定你的意思是“最好是在一张唱片中”。您的意思是计算单个记录的给定分数将落入哪个百分位数?例如你是否希望能够发表诸如“你的分数为83,这使你处于第91百分位”的陈述。 ?
编辑:好的,我想到了更多关于你的问题并想出了这个解释。您是否在询问如何计算特定百分位数的截止分数?例如这样的事情:要达到第90个百分点,你必须得分大于78。如果是,则此查询有效。我不喜欢子查询,所以根据它的用途,我可能会尝试找到更优雅的解决方案。但是,它会返回一个单一记录的记录。
-- Find the minimum score for all scores in the 90th percentile
SELECT Min(subq.TheScore) FROM
(SELECT TOP 10 PERCENT TheScore FROM TheTable
ORDER BY TheScore DESC) AS subq
答案 1 :(得分:9)
查看NTILE命令 - 它会非常容易地为您提供百分位数!
SELECT SalesOrderID,
OrderQty,
RowNum = Row_Number() OVER(Order By OrderQty),
Rnk = RANK() OVER(ORDER BY OrderQty),
DenseRnk = DENSE_RANK() OVER(ORDER BY OrderQty),
NTile4 = NTILE(4) OVER(ORDER BY OrderQty)
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43689, 63181)
答案 2 :(得分:2)
这个怎么样:
SELECT
Group,
75_percentile = MAX(case when NTILE(4) OVER(ORDER BY score ASC) = 3 then score else 0 end),
90_percentile = MAX(case when NTILE(10) OVER(ORDER BY score ASC) = 9 then score else 0 end)
FROM TheScore
GROUP BY Group
答案 3 :(得分:1)
我一直在研究这个问题,到目前为止我已经提出了这个问题:
CREATE PROCEDURE [dbo].[TestGetPercentile]
@percentile as float,
@resultval as float output
AS
BEGIN
WITH scores(score, prev_rank, curr_rank, next_rank) AS (
SELECT dblScore,
(ROW_NUMBER() OVER ( ORDER BY dblScore ) - 1.0) / ((SELECT COUNT(*) FROM TestScores) + 1) [prev_rank],
(ROW_NUMBER() OVER ( ORDER BY dblScore ) + 0.0) / ((SELECT COUNT(*) FROM TestScores) + 1) [curr_rank],
(ROW_NUMBER() OVER ( ORDER BY dblScore ) + 1.0) / ((SELECT COUNT(*) FROM TestScores) + 1) [next_rank]
FROM TestScores
)
SELECT @resultval = (
SELECT TOP 1
CASE WHEN t1.score = t2.score
THEN t1.score
ELSE
t1.score + (t2.score - t1.score) * ((@percentile - t1.curr_rank) / (t2.curr_rank - t1.curr_rank))
END
FROM scores t1, scores t2
WHERE (t1.curr_rank = @percentile OR (t1.curr_rank < @percentile AND t1.next_rank > @percentile))
AND (t2.curr_rank = @percentile OR (t2.curr_rank > @percentile AND t2.prev_rank < @percentile))
)
END
然后在另一个存储过程中我这样做:
DECLARE @pct25 float;
DECLARE @pct50 float;
DECLARE @pct75 float;
exec SurveyGetPercentile .25, @pct25 output
exec SurveyGetPercentile .50, @pct50 output
exec SurveyGetPercentile .75, @pct75 output
Select
min(dblScore) as minScore,
max(dblScore) as maxScore,
avg(dblScore) as avgScore,
@pct25 as percentile25,
@pct50 as percentile50,
@pct75 as percentile75
From TestScores
它仍然没有完全符合我的要求。这将获得所有测试的统计数据;虽然我希望能够从一个包含多个不同测试的TestScores表中进行选择,并为每个不同的测试获取相同的统计数据(就像我在我的问题中的示例表中所有)。
答案 4 :(得分:1)
第50百分位数与中位数相同。计算其他百分位数时,比如80,按升序排序80%数据的数据,按降序排序另外20%的数据,并取两个中间值的平均值。
注意:中位数查询已存在很长时间了,但不记得我从哪里得到它,我只修改了它以计算其他百分位数。
DECLARE @Temp TABLE(Id INT IDENTITY(1,1), DATA DECIMAL(10,5))
INSERT INTO @Temp VALUES(0)
INSERT INTO @Temp VALUES(2)
INSERT INTO @Temp VALUES(8)
INSERT INTO @Temp VALUES(4)
INSERT INTO @Temp VALUES(3)
INSERT INTO @Temp VALUES(6)
INSERT INTO @Temp VALUES(6)
INSERT INTO @Temp VALUES(6)
INSERT INTO @Temp VALUES(7)
INSERT INTO @Temp VALUES(0)
INSERT INTO @Temp VALUES(1)
INSERT INTO @Temp VALUES(NULL)
--50th percentile or median
SELECT ((
SELECT TOP 1 DATA
FROM (
SELECT TOP 50 PERCENT DATA
FROM @Temp
WHERE DATA IS NOT NULL
ORDER BY DATA
) AS A
ORDER BY DATA DESC) +
(
SELECT TOP 1 DATA
FROM (
SELECT TOP 50 PERCENT DATA
FROM @Temp
WHERE DATA IS NOT NULL
ORDER BY DATA DESC
) AS A
ORDER BY DATA ASC)) / 2.0
--90th percentile
SELECT ((
SELECT TOP 1 DATA
FROM (
SELECT TOP 90 PERCENT DATA
FROM @Temp
WHERE DATA IS NOT NULL
ORDER BY DATA
) AS A
ORDER BY DATA DESC) +
(
SELECT TOP 1 DATA
FROM (
SELECT TOP 10 PERCENT DATA
FROM @Temp
WHERE DATA IS NOT NULL
ORDER BY DATA DESC
) AS A
ORDER BY DATA ASC)) / 2.0
--75th percentile
SELECT ((
SELECT TOP 1 DATA
FROM (
SELECT TOP 75 PERCENT DATA
FROM @Temp
WHERE DATA IS NOT NULL
ORDER BY DATA
) AS A
ORDER BY DATA DESC) +
(
SELECT TOP 1 DATA
FROM (
SELECT TOP 25 PERCENT DATA
FROM @Temp
WHERE DATA IS NOT NULL
ORDER BY DATA DESC
) AS A
ORDER BY DATA ASC)) / 2.0
答案 5 :(得分:0)
我可能会使用sql server 2005
row_number()结束(按分数排序)/(从分数中选择计数(*))
或类似的东西。
答案 6 :(得分:0)
select @n = count(*) from tbl1
select @median = @n / 2
select @p75 = @n * 3 / 4
select @p90 = @n * 9 / 10
select top 1 score from (select top @median score from tbl1 order by score asc) order by score desc
这是对的吗?
答案 7 :(得分:0)
百分位数由
计算 (Rank -1) /(total_rows -1)
(按升序对值排序)。
下面的查询将为您提供0到1之间的百分数值。得分最低的人的百分位数为0。
SELECT Name, marks, (rank_1-1)/((select count(*) as total_1 from table)-1)as percentile_rank
from
(
SELECT Name,
Marks,
RANK() OVER (ORDER BY Marks) AS rank_1
from table
) as A