我有以下MySQL查询,它完美运行:
select
count(*) as `# of Data points`,
name,
max((QNTY_Sell/QNTYDelivered)*1000) as `MAX Thousand Price`,
min((QNTY_Sell/QNTYDelivered)*1000) as `MIN Thousand Price`,
avg((QNTY_Sell/QNTYDelivered)*1000) as `MEAN Thousand Price`
from
table_name
where
year(date) >= 2012 and
name like "%the_name%" and
QNTYDelivered > 0 and
QNTY_Sell > 0
group by name
order by name;
现在我还希望添加一个结果列,它为每行提供MEDIAN数据。在SELECT
下,在完美的世界中,这将是这样的:
median((QNTY_Sell/QNTYDelivered)*1000) as `MEDIAN Thousand Price`
在Google上搜索MySQL中值函数让我得到了这个答案,如果您对整个表的数据集的中位数感兴趣,这似乎没问题:Simple way to calculate median with MySQL
这里的不同之处在于我正在通过name
列对表格中的数据进行分组,并希望获得按此列分组的每行数据的中位数。
有谁知道允许我这样做的好玩法?
谢谢!
答案 0 :(得分:3)
即使没有内置函数,也可以使用MySQL中的GROUP BY计算中位数。
考虑一下表:
Acrington 200.00
Acrington 200.00
Acrington 300.00
Acrington 400.00
Bulingdon 200.00
Bulingdon 300.00
Bulingdon 400.00
Bulingdon 500.00
Cardington 100.00
Cardington 149.00
Cardington 151.00
Cardington 300.00
Cardington 300.00
对于每一行,您可以计算较少的类似项目的数量。您还可以计算有多少值小于或等于:
name v < <=
Acrington 200.00 0 2
Acrington 200.00 0 2
Acrington 300.00 2 3
Acrington 400.00 3 4
Bulingdon 200.00 0 1
Bulingdon 300.00 1 2
Bulingdon 400.00 2 3
Bulingdon 500.00 3 4
Cardington 100.00 0 1
Cardington 149.00 1 2
Cardington 151.00 2 3
Cardington 300.00 3 5
Cardington 300.00 3 5
使用查询
SELECT name,v, (SELECT COUNT(1) FROM sale WHERE v<o.v AND name=o.name) as ls
, (SELECT COUNT(1) FROM sale WHERE v<=o.v AND name=o.name) as lse
FROM sale o
当小于或等于的数量是项目数量的一半时,将出现中值
Acrington 有4件商品。其中一半为2,范围为0..2(对应200.00),范围为2..3(对应300.00)
Bullingdon 也有4项。 2的范围是1..2(值300.00)和2..3(值400.00)
Cardington 有5件商品。值2.5介于2和3之间,对应于Cardington 151。
中值是返回的最小值和最大值的平均值:
SELECT cs.name,v
FROM
(SELECT name,v, (SELECT COUNT(1) FROM sale WHERE v<o.v AND name=o.name) as ls
, (SELECT COUNT(1) FROM sale WHERE v<=o.v AND name=o.name) as lse
FROM sale o) cs JOIN
(SELECT name,COUNT(1)*.5 as cn
FROM sale
GROUP BY name) cc ON cs.name=cc.name
WHERE cn between ls and lse
给出了:
Acrington 200.00
Acrington 200.00
Acrington 300.00
Bulingdon 300.00
Bulingdon 400.00
Cardington 151.00
最后我们可以得到中位数:
SELECT name,(MAX(v)+MIN(v))/2 FROM
(SELECT cs.name,v
FROM
(SELECT name,v, (SELECT COUNT(1) FROM sale WHERE v<o.v AND name=o.name) as ls
, (SELECT COUNT(1) FROM sale WHERE v<=o.v AND name=o.name) as lse
FROM sale o) cs JOIN
(SELECT name,COUNT(1)*.5 as cn
FROM sale
GROUP BY name) cc ON cs.name=cc.name
WHERE cn between ls and lse
) AS medians
GROUP BY name
给予
Acrington 250.000000
Bulingdon 350.000000
Cardington 151.000000
答案 1 :(得分:2)
我发现这样做的唯一方法是通过字符串操作:
使用GROUP_CONCAT
创建所有值的列表,然后使用缩进SUBSTRING_INDEX
获取中值
SELECT
count(*) AS `# of Data points`,
name,
max((QNTY_Sell/QNTYDelivered)*1000) AS `MAX Thousand Price`,
min((QNTY_Sell/QNTYDelivered)*1000) AS `MIN Thousand Price`,
avg((QNTY_Sell/QNTYDelivered)*1000) AS `MEAN Thousand Price`
, CASE (count(*) % 2)
WHEN 1 THEN SUBSTRING_INDEX(
SUBSTRING_INDEX(
group_concat((QNTY_Sell/QNTYDelivered)*1000
ORDER BY (QNTY_Sell/QNTYDelivered)*1000 SEPARATOR ',')
, ',', (count(*) + 1) / 2)
, ',', -1)
ELSE (SUBSTRING_INDEX(
SUBSTRING_INDEX(
group_concat((QNTY_Sell/QNTYDelivered)*1000
ORDER BY (QNTY_Sell/QNTYDelivered)*1000 SEPARATOR ',')
, ',', count(*) / 2)
, ',', -1)
+ SUBSTRING_INDEX(
SUBSTRING_INDEX(
group_concat((QNTY_Sell/QNTYDelivered)*1000
ORDER BY (QNTY_Sell/QNTYDelivered)*1000 SEPARATOR ',')
, ',', (count(*) + 1) / 2)
, ',', -1)) / 2
END median
FROM
sales
WHERE
year(date) >= 2012 AND
name LIKE "%art.%" AND
QNTYDelivered > 0 AND
QNTY_Sell > 0
GROUP BY name
ORDER BY name;
CASE需要检查我们是否有一个中值,奇数个值,或两个中值,偶数个值,在第二种情况下,中位数是两个值的平均值