我有一个包含不同类别的分箱数据的表格,例如:
category, bin, frequency
a, 0, 10
a, 1, 20
a, 2, 30
a, 3, 15
b, 0, 18
b, 1, 54
b, 2, 33
b, 3, 24
我需要找到每个类别的近似中位数。为此,我想计算每个类别的累积百分比直方图,并将第一个值高于50%。我知道如何为一个类别执行此操作:
SELECT category, bin as approx_median
FROM (
SELECT category, bin, frequency,
(SELECT SUM(frequency) FROM table sub WHERE sub.bin <= base.bin)
/ (SELECT SUM(frequency) FROM table)
* 100 as running_percent
FROM table base
WHERE category = a
ORDER BY bin ) p
WHERE p.running_percent >= 50.0
LIMIT 1
问题是,如何为所有类别执行此操作以获取结果
category, approx_median
a, 2
b, 1
感谢您的任何建议。
答案 0 :(得分:2)
您可能想要做的是这样的事情:
SELECT category, Min(bin) As approx_median
FROM(
SELECT base.category,
base.bin,
(SELECT SUM(sub.frequency) AS SummeBin FROM [table] sub WHERE sub.bin <= base.bin and sub.category = base.category)
/ (SELECT SUM(sub.frequency) FROM [table] sub WHERE sub.category = base.category GROUP BY sub.category) * 100 as running_percent
FROM [table] base
) p
WHERE running_percent >= 50.0
GROUP BY category
您需要对类别进行分组,并在聚合中引用它。 如果使用SQL Server 2012及更高版本,则可以使用Window函数。 ABC-Analysis with Window Function的示例。
答案 1 :(得分:0)
你可以使用IN运算符,我不知道它是否有用。试试吧。
SELECT category, bin as approx_median
FROM (
SELECT category, bin, frequency,
(SELECT SUM(frequency) FROM table sub WHERE sub.bin <= base.bin)
/ (SELECT SUM(frequency) FROM table)
* 100 as running_percent
FROM table base
WHERE category in (select distinct category from table)
ORDER BY bin ) p
WHERE p.running_percent >= 50.0
LIMIT 1
答案 2 :(得分:0)
如果您发布的查询正在执行您真正想要的操作,那么只需删除条件WHERE category = a
并尝试一下即可。无论如何,您的running_percent计算都基于bin列。您可以按类别进一步订购外部查询,以使其看起来更好。