用于查找属于类别

时间:2016-08-31 13:23:11

标签: mysql sql

我有一个包含不同类别的分箱数据的表格,例如:

category, bin, frequency
a, 0, 10
a, 1, 20
a, 2, 30
a, 3, 15
b, 0, 18
b, 1, 54
b, 2, 33
b, 3, 24

我需要找到每个类别的近似中位数。为此,我想计算每个类别的累积百分比直方图,并将第一个值高于50%。我知道如何为一个类别执行此操作:

SELECT category, bin as approx_median
FROM (
SELECT category, bin, frequency,
    (SELECT SUM(frequency) FROM table sub WHERE sub.bin <= base.bin) 
    / (SELECT SUM(frequency) FROM table) 
    * 100 as running_percent    
FROM table base
WHERE category = a
ORDER BY bin ) p
WHERE p.running_percent >= 50.0
LIMIT 1

问题是,如何为所有类别执行此操作以获取结果

category, approx_median
a, 2
b, 1

感谢您的任何建议。

3 个答案:

答案 0 :(得分:2)

您可能想要做的是这样的事情:

SELECT category, Min(bin) As approx_median
FROM(
    SELECT base.category, 
    base.bin, 
    (SELECT SUM(sub.frequency) AS SummeBin FROM [table] sub WHERE sub.bin <= base.bin and sub.category = base.category)
    / (SELECT SUM(sub.frequency) FROM [table] sub WHERE sub.category = base.category GROUP BY sub.category) * 100 as running_percent 
    FROM [table] base
) p
WHERE running_percent >= 50.0
GROUP BY category

您需要对类别进行分组,并在聚合中引用它。 如果使用SQL Server 2012及更高版本,则可以使用Window函数。 ABC-Analysis with Window Function的示例。

答案 1 :(得分:0)

你可以使用IN运算符,我不知道它是否有用。试试吧。

SELECT category, bin as approx_median
FROM (
SELECT category, bin, frequency,
    (SELECT SUM(frequency) FROM table sub WHERE sub.bin <= base.bin) 
    / (SELECT SUM(frequency) FROM table) 
    * 100 as running_percent    
FROM table base
WHERE category in (select distinct category from table)
ORDER BY bin ) p
WHERE p.running_percent >= 50.0
LIMIT 1

答案 2 :(得分:0)

如果您发布的查询正在执行您真正想要的操作,那么只需删除条件WHERE category = a并尝试一下即可。无论如何,您的running_percent计算都基于bin列。您可以按类别进一步订购外部查询,以使其看起来更好。