在MySQL中使用MEDIAN和MAX,MIN和AVG功能

时间:2013-04-30 10:45:35

标签: mysql sql statistics median

我有以下MySQL查询,它完美运行:

select 
    count(*) as `# of Data points`, 
    name, 
    max((QNTY_Sell/QNTYDelivered)*1000) as `MAX Thousand Price`,
    min((QNTY_Sell/QNTYDelivered)*1000) as `MIN Thousand Price`,
    avg((QNTY_Sell/QNTYDelivered)*1000) as `MEAN Thousand Price` 
from 
    table_name 
where 
    year(date) >= 2012 and 
    name like "%the_name%" and 
    QNTYDelivered > 0 and 
    QNTY_Sell > 0 
group by name 
order by name;

现在我还希望添加一个结果列,它为每行提供MEDIAN数据。在SELECT下,在完美的世界中,这将是这样的:

median((QNTY_Sell/QNTYDelivered)*1000) as `MEDIAN Thousand Price`

在Google上搜索MySQL中值函数让我得到了这个答案,如果您对整个表的数据集的中位数感兴趣,这似乎没问题:Simple way to calculate median with MySQL

这里的不同之处在于我正在通过name列对表格中的数据进行分组,并希望获得按此列分组的每行数据的中位数。

有谁知道允许我这样做的好玩法?

谢谢!

2 个答案:

答案 0 :(得分:3)

即使没有内置函数,也可以使用MySQL中的GROUP BY计算中位数。

考虑一下表:

Acrington   200.00
Acrington   200.00
Acrington   300.00
Acrington   400.00
Bulingdon   200.00
Bulingdon   300.00
Bulingdon   400.00
Bulingdon   500.00
Cardington  100.00
Cardington  149.00
Cardington  151.00
Cardington  300.00
Cardington  300.00

对于每一行,您可以计算较少的类似项目的数量。您还可以计算有多少值小于或等于:

name        v       <   <=
Acrington   200.00  0   2
Acrington   200.00  0   2
Acrington   300.00  2   3
Acrington   400.00  3   4
Bulingdon   200.00  0   1
Bulingdon   300.00  1   2
Bulingdon   400.00  2   3
Bulingdon   500.00  3   4
Cardington  100.00  0   1
Cardington  149.00  1   2
Cardington  151.00  2   3
Cardington  300.00  3   5
Cardington  300.00  3   5

使用查询

SELECT name,v, (SELECT COUNT(1) FROM sale WHERE v<o.v AND name=o.name) as ls
             , (SELECT COUNT(1) FROM sale WHERE v<=o.v AND name=o.name) as lse
  FROM sale o

当小于或等于的数量是项目数量的一半时,将出现中值

  • Acrington 有4件商品。其中一半为2,范围为0..2(对应200.00),范围为2..3(对应300.00)

  • Bullingdon 也有4项。 2的范围是1..2(值300.00)和2..3(值400.00)

  • Cardington 有5件商品。值2.5介于2和3之间,对应于Cardington 151。

中值是返回的最小值和最大值的平均值:

SELECT cs.name,v
   FROM
   (SELECT name,v, (SELECT COUNT(1) FROM sale WHERE v<o.v AND name=o.name) as ls
                 , (SELECT COUNT(1) FROM sale WHERE v<=o.v AND name=o.name) as lse
      FROM sale o) cs JOIN
   (SELECT name,COUNT(1)*.5 as cn
      FROM sale
      GROUP BY name) cc ON cs.name=cc.name
 WHERE cn between ls and lse

给出了:

Acrington   200.00
Acrington   200.00
Acrington   300.00
Bulingdon   300.00
Bulingdon   400.00
Cardington  151.00

最后我们可以得到中位数:

SELECT name,(MAX(v)+MIN(v))/2 FROM
(SELECT cs.name,v
   FROM
   (SELECT name,v, (SELECT COUNT(1) FROM sale WHERE v<o.v AND name=o.name) as ls
                 , (SELECT COUNT(1) FROM sale WHERE v<=o.v AND name=o.name) as lse
      FROM sale o) cs JOIN
   (SELECT name,COUNT(1)*.5 as cn
      FROM sale
     GROUP BY name) cc ON cs.name=cc.name
 WHERE cn between ls and lse
 ) AS medians
GROUP BY name

给予

Acrington   250.000000
Bulingdon   350.000000
Cardington  151.000000

答案 1 :(得分:2)

我发现这样做的唯一方法是通过字符串操作:
使用GROUP_CONCAT创建所有值的列表,然后使用缩进SUBSTRING_INDEX获取中值

SELECT
    count(*) AS `# of Data points`,
    name,
    max((QNTY_Sell/QNTYDelivered)*1000) AS `MAX Thousand Price`,
    min((QNTY_Sell/QNTYDelivered)*1000) AS `MIN Thousand Price`,
    avg((QNTY_Sell/QNTYDelivered)*1000) AS `MEAN Thousand Price`
  , CASE (count(*) % 2)
    WHEN 1 THEN SUBSTRING_INDEX(
      SUBSTRING_INDEX(
        group_concat((QNTY_Sell/QNTYDelivered)*1000 
                      ORDER BY (QNTY_Sell/QNTYDelivered)*1000 SEPARATOR ',')
      , ',', (count(*) + 1) / 2)
    , ',', -1)
    ELSE (SUBSTRING_INDEX(
      SUBSTRING_INDEX(
        group_concat((QNTY_Sell/QNTYDelivered)*1000 
                      ORDER BY (QNTY_Sell/QNTYDelivered)*1000 SEPARATOR ',')
      , ',', count(*) / 2)
    , ',', -1)
  + SUBSTRING_INDEX(
      SUBSTRING_INDEX(
        group_concat((QNTY_Sell/QNTYDelivered)*1000 
                      ORDER BY (QNTY_Sell/QNTYDelivered)*1000 SEPARATOR ',')
      , ',', (count(*) + 1) / 2)
    , ',', -1)) / 2
    END median
FROM
    sales
WHERE
    year(date) >= 2012 AND
    name LIKE "%art.%" AND
    QNTYDelivered > 0 AND
    QNTY_Sell > 0
GROUP BY name
ORDER BY name;  

CASE需要检查我们是否有一个中值,奇数个值,或两个中值,偶数个值,在第二种情况下,中位数是两个值的平均值

SQLFiddle