T-SQL中的中值

时间:2011-08-29 08:14:13

标签: sql tsql sql-server-2008

对于偶数行,下表中的公式为(104.5 + 108)/ 2,对于奇数行,下表为108:

Total       Total

100         100
101         101
104.5       104.5
108         108
108.3       108.3
112         112
            114

我写了这个查询,它正在计算正确的中位数 行数是奇数:

WITH    a AS ( SELECT   Total ,
                        ROW_NUMBER() OVER ( ORDER BY CAST(Total AS FLOAT) ASC ) rownumber
               FROM     [Table] A
             ),
        b AS ( SELECT TOP 2
                        Total ,
                        isodd
               FROM     ( SELECT TOP 50 PERCENT
                                    Total ,
                                    rownumber % 2 isodd
                          FROM      a
                          ORDER BY  CAST(Total AS FLOAT) ASC
                        ) a
               ORDER BY CAST(total AS FLOAT) DESC
             )
    SELECT  *
    FROM    b

查找中位数的常规T-SQL查询是什么? 两种情况?就像行数是奇数和 当行数是偶数时?

我的查询是否可以被扭曲,以便它可以在偶数行和奇数行中的中位数工作?

5 个答案:

答案 0 :(得分:10)

几年前我写了一篇关于Mean,Median和Mode的博客。我鼓励你阅读它。

Calculating Mean, Median, and Mode with SQL Server

SELECT ((
        SELECT TOP 1 Total
        FROM   (
                SELECT  TOP 50 PERCENT Total
                FROM    [TABLE] A
                WHERE   Total IS NOT NULL
                ORDER BY Total
                ) AS A
        ORDER BY Total DESC) +
        (
        SELECT TOP 1 Total
        FROM   (
                SELECT  TOP 50 PERCENT Total
                FROM    [TABLE] A
                WHERE   Total IS NOT NULL
                ORDER BY Total DESC
                ) AS A
        ORDER BY Total ASC)) / 2

答案 1 :(得分:8)

我知道您正在寻找适用于SQL Server 2008的解决方案,但是如果有人在SQL Server 2012中寻找MEDIAN()聚合函数,他们可以使用PERCENTILE_CONT()逆来模拟它分配功能:

WITH t(value) AS (
  SELECT 1   UNION ALL
  SELECT 2   UNION ALL
  SELECT 100 
)
SELECT DISTINCT
  percentile_cont(0.5) WITHIN GROUP (ORDER BY value) OVER (PARTITION BY 1)
FROM
  t;

This emulation of MEDIAN() via PERCENTILE_CONT() is also documented here。不幸的是,SQL Server仅支持此函数作为窗口函数,而不是像Oracle或PostgreSQL那样的常规有序集合函数。

答案 2 :(得分:3)

遗憾的是,当列表中有大量重复值时,t-clausens的答案无法正常工作。然后由不同的OVER子句生成的行号不可预测,这个查询有效。

以下情况在我的案例中运作良好:

WITH SortedTable AS
    (
        SELECT Total, 
               rnasc, 
               rndesc = ROW_NUMBER() OVER(ORDER BY rnasc DESC)
        FROM ( 
               SELECT Total, 
                      rnasc = ROW_NUMBER() OVER(ORDER BY Total)
               FROM   [Table]
             ) SourceTable
    )
SELECT DISTINCT AVG(Total) median 
FROM   SortedTable
WHERE  rnasc = rndesc OR ABS(rnasc-rndesc) = 1

WHERE子句现在也可以清楚地区分偶数和奇数的记录。

答案 3 :(得分:3)

my comment中提到的已接受答案的问题示例:

select avg(Total) median from
(
select Total, 
rnasc = row_number() over(order by Total),
rndesc = row_number() over(order by Total desc)
 from [Table] 
) b
where rnasc between rndesc - 1 and rndesc + 1

如果输入数据集中存在重复值,则无法保证此代码段有效 - 因此row_number()不会提供预期值。

例如输入:

DROP TABLE #b
CREATE TABLE #b (id INT IDENTITY, Total INT)
INSERT INTO #b 
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT  5 
UNION ALL SELECT  5 UNION ALL SELECT  5

内部查询返回(我猜它可能在不同的服务器上有所不同):

Total   rnasc   rndesc
5       3      1
5       4      2
5       5      3
1       1      4
1       2      5

Runnig外部查询将导致NULL(因为没有行,其中 rnasc在rndesc - 1和rndesc + 1 之间)

简单的解决方案是将一些代理键(我使用标识列)添加到数据集中,并在OVER()子句中包含此列:

SELECT avg(Total) median from
(
SELECT Total, 
rnasc = row_number() over(order by Total, id),
rndesc = row_number() over(order by Total DESC, id desc)
 from #b
) b
WHERE rnasc between rndesc - 1 and rndesc + 1

现在排序顺序得到保证,内部查询返回:

Total   rnasc   rndesc
5       5       1
5       4       2
5       3       3
1       2       4
1       1       5

结果是正确的:)

答案 4 :(得分:1)

我知道这是一个古老的问题,但是为了其他人的缘故,我还是发布了此问题。 PERCENTILE_COUNT(0.5)的性能很慢。我有一张桌子,上面有490万条记录,PERCENTILE_COUNT(0.5)用了52秒。上面的G Mastros回答更好(也是我的最爱,除了我的),但我的桌子仍然花了35秒。我调整了他的解决方案以进行以下操作,它在7秒钟内运行,而该列上没有索引。当我添加索引时,它下降到2秒。我所做的只是用表中记录计数的整数除来代替50 PERCENT

DECLARE @Cnt int = (SELECT COUNT(*) FROM [TABLE]);

SELECT ((
    SELECT TOP 1 Total
    FROM   (
            SELECT  TOP (@Cnt/2) Total
            FROM    [TABLE] A
            WHERE   Total IS NOT NULL
            ORDER BY Total
            ) AS A
    ORDER BY Total DESC) +
    (
    SELECT TOP 1 Total
    FROM   (
            SELECT  TOP (@Cnt/2) Total
            FROM    [TABLE] A
            WHERE   Total IS NOT NULL
            ORDER BY Total DESC
            ) AS A
    ORDER BY Total ASC)) / 2