如何获得每条记录的中位数?

时间:2012-10-29 18:00:42

标签: sql sql-server sql-server-2008 tsql

sql server中没有中值函数,所以我使用了这个很棒的建议:

https://stackoverflow.com/a/2026609/117700

这会计算整个数据集的中位数,但我需要每条记录的中位数。

我的数据集是:

+-----------+-------------+
| client_id | TimesTested |
+-----------+-------------+
|    214220 |           1 |
|    215425 |           1 |
|    212839 |           4 |
|    215249 |           1 |
|    210498 |           3 |
|    110655 |           1 |
|    110655 |           1 |
|    110655 |          12 |
|    215425 |           4 |
|    100196 |           1 |
|    110032 |           1 |
|    110032 |           1 |
|    101944 |           3 |
|    101232 |           2 |
|    101232 |           1 |
+-----------+-------------+

这是我正在使用的查询:

select client_id,  
    (
    SELECT
    (
     (SELECT MAX(TimesTested ) FROM
       (SELECT TOP 50 PERCENT t.TimesTested 
       FROM counted3 t 
       where t.timestested>1 
       and CLIENT_ID=t.CLIENT_ID 
       ORDER BY t.TimesTested ) AS BottomHalf)
     +
     (SELECT MIN(TimesTested ) FROM
       (SELECT TOP 50 PERCENT t.TimesTested 
       FROM counted3 t 
       where t.timestested>1 
       and CLIENT_ID=t.CLIENT_ID 
       ORDER BY t.TimesTested DESC) AS TopHalf)
    ) / 2 AS Median
    ) TotalAvgTestFreq
from counted3 

group by client_id

但它提供了我有趣的数据:

+-----------+------------------+
| client_id | median???????????|
+-----------+------------------+
|    100007 |               84 |
|    100008 |               84 |
|    100011 |               84 |
|    100014 |               84 |
|    100026 |               84 |
|    100027 |               84 |
|    100028 |               84 |
|    100029 |               84 |
|    100042 |               84 |
|    100043 |               84 |
|    100071 |               84 |
|    100072 |               84 |
|    100074 |               84 |
+-----------+------------------+

我可以获得每个client_id的中位数吗?

我目前正在尝试使用Aaron网站上的 awesome 查询:

select c3.client_id,(
    SELECT AVG(1.0 * TimesTested ) median
    FROM
    (
        SELECT o.TimesTested , 
        rn = ROW_NUMBER() OVER (ORDER BY o.TimesTested ), c.c
        FROM counted3 AS o
        CROSS JOIN (SELECT c = COUNT(*) FROM counted3) AS c
        where count>1
    ) AS x
    WHERE rn IN ((c + 1)/2, (c + 2)/2)
    ) a
    from counted3 c3
    group by c3.client_id

不幸的是,正如Richardthekiwi指出的那样:

  

是单个中位数,而这个问题是关于中位数   每个分区的

我想知道我如何加入counted3以获得每个分区的中位数?>

2 个答案:

答案 0 :(得分:1)

试试这个:

select client_id,  
    (
    SELECT
    (
     (SELECT MAX(testfreq) FROM
       (SELECT TOP 50 PERCENT t.testfreq 
       FROM counted3 t 
       where t.timestested>1 
       and c3.CLIENT_ID=t.CLIENT_ID 
       ORDER BY t.testfreq) AS BottomHalf)
     +
     (SELECT MIN(testfreq) FROM
       (SELECT TOP 50 PERCENT t.testfreq 
       FROM counted3 t 
       where t.timestested>1 
       and c3.CLIENT_ID=t.CLIENT_ID 
       ORDER BY t.testfreq DESC) AS TopHalf)
    ) / 2 AS Median
    ) TotalAvgTestFreq
from counted3 c3

group by client_id

我将c3别名添加到外部CLIENT_ID引用和外部表。

答案 1 :(得分:1)

注意:如果testFreq是intbigint类型,则需要在取平均值之前对其进行CAST,否则您将获得整数除法,例如: (2+5)/2 => 3如果2和5是中位数记录 - 例如AVG(Cast(testfreq as float))

select client_id, avg(testfreq) median_testfreq
from
(
    select client_id,
           testfreq,
           rn=row_number() over (partition by CLIENT_ID
                                 order by testfreq),
           c=count(testfreq) over (partition by CLIENT_ID)
    from tbk
    where timestested>1
) g
where rn in (round(c/2,0),c/2+1)
group by client_id;

中位数既可以作为ODD行数的中心记录,也可以作为偶数行中两个中心记录的平均值。这由条件rn in (round(c/2,0),c/2+1)处理,条件选择所需的一个或两个记录。