我能够使用下面的逻辑编写查询来查找中位数,我遇到的问题是尝试理解逻辑。有人可以帮我理解发生了什么。我从一本高级sql书中得到了代码。
此代码特别适用于奇数和偶数。我尝试了代码并且它有效,但我很好奇理解逻辑。
select avg(sales) as median
from
(select g1.sales
from ga g1, ga g2
group by g1.sales
having sum(case when g1.sales = g2.sales then 1 ELSE 0 END) >= ABS(SUM(SIGN(g1.sales-g2.sales))))g3;
答案 0 :(得分:1)
按销售数量对笛卡尔积进行分组,以找到“中间”1或2个销售额,然后将结果平均值给出中位数。查看详细的评论内容。
--the subquery will return the 1 or 2 middle values and the average of those is the median
select avg(sales * 1.0) as median
from (
select g1.sales
--this creates a cartesian product of ga with itself
from ga g1, ga g2
--and then group by sales, which allows comparision of each given sales figure all others
group by g1.sales
having
--the sum(case) here acts a count of row in the cartesian product that have matching sales values
--this will be the the square of the count() from ga where for each given sales number
sum(
case
when g1.sales = g2.sales
then 1
ELSE 0
END)
>= --The comparison acts as a de-weighting mechanism to handle duplicate sales numbers
--Such that if I have the same sales figure twice I'll have 4 rows in the Cartesian product
--and I should see a matching 4 or 0 if the row can be used in the final median calculation
--the abs(sum(sign())) here acts as a measure of how far of the median each sales is
--by looking at how many sales are greater then, equal, a lesser. The sales at or nearest
--the median will have the lowest numbers here.
ABS(
SUM(
SIGN(g1.sales-g2.sales)
)
)
)g3;