sql server中没有中值函数,所以我使用了这个很棒的建议:
https://stackoverflow.com/a/2026609/117700
这会计算整个数据集的中位数,但我需要每条记录的中位数。
我的数据集是:
+-----------+-------------+
| client_id | TimesTested |
+-----------+-------------+
| 214220 | 1 |
| 215425 | 1 |
| 212839 | 4 |
| 215249 | 1 |
| 210498 | 3 |
| 110655 | 1 |
| 110655 | 1 |
| 110655 | 12 |
| 215425 | 4 |
| 100196 | 1 |
| 110032 | 1 |
| 110032 | 1 |
| 101944 | 3 |
| 101232 | 2 |
| 101232 | 1 |
+-----------+-------------+
这是我正在使用的查询:
select client_id,
(
SELECT
(
(SELECT MAX(TimesTested ) FROM
(SELECT TOP 50 PERCENT t.TimesTested
FROM counted3 t
where t.timestested>1
and CLIENT_ID=t.CLIENT_ID
ORDER BY t.TimesTested ) AS BottomHalf)
+
(SELECT MIN(TimesTested ) FROM
(SELECT TOP 50 PERCENT t.TimesTested
FROM counted3 t
where t.timestested>1
and CLIENT_ID=t.CLIENT_ID
ORDER BY t.TimesTested DESC) AS TopHalf)
) / 2 AS Median
) TotalAvgTestFreq
from counted3
group by client_id
但它提供了我有趣的数据:
+-----------+------------------+
| client_id | median???????????|
+-----------+------------------+
| 100007 | 84 |
| 100008 | 84 |
| 100011 | 84 |
| 100014 | 84 |
| 100026 | 84 |
| 100027 | 84 |
| 100028 | 84 |
| 100029 | 84 |
| 100042 | 84 |
| 100043 | 84 |
| 100071 | 84 |
| 100072 | 84 |
| 100074 | 84 |
+-----------+------------------+
我可以获得每个client_id的中位数吗?
我目前正在尝试使用Aaron网站上的 awesome 查询:
select c3.client_id,(
SELECT AVG(1.0 * TimesTested ) median
FROM
(
SELECT o.TimesTested ,
rn = ROW_NUMBER() OVER (ORDER BY o.TimesTested ), c.c
FROM counted3 AS o
CROSS JOIN (SELECT c = COUNT(*) FROM counted3) AS c
where count>1
) AS x
WHERE rn IN ((c + 1)/2, (c + 2)/2)
) a
from counted3 c3
group by c3.client_id
不幸的是,正如Richardthekiwi指出的那样:
是单个中位数,而这个问题是关于中位数 每个分区的
我想知道我如何加入counted3
以获得每个分区的中位数?>
答案 0 :(得分:1)
试试这个:
select client_id,
(
SELECT
(
(SELECT MAX(testfreq) FROM
(SELECT TOP 50 PERCENT t.testfreq
FROM counted3 t
where t.timestested>1
and c3.CLIENT_ID=t.CLIENT_ID
ORDER BY t.testfreq) AS BottomHalf)
+
(SELECT MIN(testfreq) FROM
(SELECT TOP 50 PERCENT t.testfreq
FROM counted3 t
where t.timestested>1
and c3.CLIENT_ID=t.CLIENT_ID
ORDER BY t.testfreq DESC) AS TopHalf)
) / 2 AS Median
) TotalAvgTestFreq
from counted3 c3
group by client_id
我将c3别名添加到外部CLIENT_ID引用和外部表。
答案 1 :(得分:1)
注意:如果testFreq是int
或bigint
类型,则需要在取平均值之前对其进行CAST,否则您将获得整数除法,例如: (2+5)/2 => 3
如果2和5是中位数记录 - 例如AVG(Cast(testfreq as float))
。
select client_id, avg(testfreq) median_testfreq
from
(
select client_id,
testfreq,
rn=row_number() over (partition by CLIENT_ID
order by testfreq),
c=count(testfreq) over (partition by CLIENT_ID)
from tbk
where timestested>1
) g
where rn in (round(c/2,0),c/2+1)
group by client_id;
中位数既可以作为ODD行数的中心记录,也可以作为偶数行中两个中心记录的平均值。这由条件rn in (round(c/2,0),c/2+1)
处理,条件选择所需的一个或两个记录。