对于偶数行,下表中的公式为(104.5 + 108)/ 2,对于奇数行,下表为108:
Total Total
100 100
101 101
104.5 104.5
108 108
108.3 108.3
112 112
114
我写了这个查询,它正在计算正确的中位数 行数是奇数:
WITH a AS ( SELECT Total ,
ROW_NUMBER() OVER ( ORDER BY CAST(Total AS FLOAT) ASC ) rownumber
FROM [Table] A
),
b AS ( SELECT TOP 2
Total ,
isodd
FROM ( SELECT TOP 50 PERCENT
Total ,
rownumber % 2 isodd
FROM a
ORDER BY CAST(Total AS FLOAT) ASC
) a
ORDER BY CAST(total AS FLOAT) DESC
)
SELECT *
FROM b
查找中位数的常规T-SQL查询是什么? 两种情况?就像行数是奇数和 当行数是偶数时?
我的查询是否可以被扭曲,以便它可以在偶数行和奇数行中的中位数工作?
答案 0 :(得分:10)
几年前我写了一篇关于Mean,Median和Mode的博客。我鼓励你阅读它。
Calculating Mean, Median, and Mode with SQL Server
SELECT ((
SELECT TOP 1 Total
FROM (
SELECT TOP 50 PERCENT Total
FROM [TABLE] A
WHERE Total IS NOT NULL
ORDER BY Total
) AS A
ORDER BY Total DESC) +
(
SELECT TOP 1 Total
FROM (
SELECT TOP 50 PERCENT Total
FROM [TABLE] A
WHERE Total IS NOT NULL
ORDER BY Total DESC
) AS A
ORDER BY Total ASC)) / 2
答案 1 :(得分:8)
我知道您正在寻找适用于SQL Server 2008的解决方案,但是如果有人在SQL Server 2012中寻找MEDIAN()
聚合函数,他们可以使用PERCENTILE_CONT()
逆来模拟它分配功能:
WITH t(value) AS (
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 100
)
SELECT DISTINCT
percentile_cont(0.5) WITHIN GROUP (ORDER BY value) OVER (PARTITION BY 1)
FROM
t;
This emulation of MEDIAN()
via PERCENTILE_CONT()
is also documented here。不幸的是,SQL Server仅支持此函数作为窗口函数,而不是像Oracle或PostgreSQL那样的常规有序集合函数。
答案 2 :(得分:3)
以下情况在我的案例中运作良好:
WITH SortedTable AS
(
SELECT Total,
rnasc,
rndesc = ROW_NUMBER() OVER(ORDER BY rnasc DESC)
FROM (
SELECT Total,
rnasc = ROW_NUMBER() OVER(ORDER BY Total)
FROM [Table]
) SourceTable
)
SELECT DISTINCT AVG(Total) median
FROM SortedTable
WHERE rnasc = rndesc OR ABS(rnasc-rndesc) = 1
WHERE子句现在也可以清楚地区分偶数和奇数的记录。
答案 3 :(得分:3)
my comment中提到的已接受答案的问题示例:
select avg(Total) median from
(
select Total,
rnasc = row_number() over(order by Total),
rndesc = row_number() over(order by Total desc)
from [Table]
) b
where rnasc between rndesc - 1 and rndesc + 1
如果输入数据集中存在重复值,则无法保证此代码段有效 - 因此row_number()不会提供预期值。
例如输入:
DROP TABLE #b
CREATE TABLE #b (id INT IDENTITY, Total INT)
INSERT INTO #b
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 5
UNION ALL SELECT 5 UNION ALL SELECT 5
内部查询返回(我猜它可能在不同的服务器上有所不同):
Total rnasc rndesc
5 3 1
5 4 2
5 5 3
1 1 4
1 2 5
Runnig外部查询将导致NULL(因为没有行,其中 rnasc在rndesc - 1和rndesc + 1 之间)
简单的解决方案是将一些代理键(我使用标识列)添加到数据集中,并在OVER()子句中包含此列:
SELECT avg(Total) median from
(
SELECT Total,
rnasc = row_number() over(order by Total, id),
rndesc = row_number() over(order by Total DESC, id desc)
from #b
) b
WHERE rnasc between rndesc - 1 and rndesc + 1
现在排序顺序得到保证,内部查询返回:
Total rnasc rndesc
5 5 1
5 4 2
5 3 3
1 2 4
1 1 5
结果是正确的:)
答案 4 :(得分:1)
我知道这是一个古老的问题,但是为了其他人的缘故,我还是发布了此问题。 PERCENTILE_COUNT(0.5)
的性能很慢。我有一张桌子,上面有490万条记录,PERCENTILE_COUNT(0.5)
用了52秒。上面的G Mastros回答更好(也是我的最爱,除了我的),但我的桌子仍然花了35秒。我调整了他的解决方案以进行以下操作,它在7秒钟内运行,而该列上没有索引。当我添加索引时,它下降到2秒。我所做的只是用表中记录计数的整数除来代替50 PERCENT
。
DECLARE @Cnt int = (SELECT COUNT(*) FROM [TABLE]);
SELECT ((
SELECT TOP 1 Total
FROM (
SELECT TOP (@Cnt/2) Total
FROM [TABLE] A
WHERE Total IS NOT NULL
ORDER BY Total
) AS A
ORDER BY Total DESC) +
(
SELECT TOP 1 Total
FROM (
SELECT TOP (@Cnt/2) Total
FROM [TABLE] A
WHERE Total IS NOT NULL
ORDER BY Total DESC
) AS A
ORDER BY Total ASC)) / 2