SQL中的中位数使用abs

时间:2014-10-09 05:14:42

标签: mysql sql statistics median

我能够使用下面的逻辑编写查询来查找中位数,我遇到的问题是尝试理解逻辑。有人可以帮我理解发生了什么。我从一本高级sql书中得到了代码。

此代码特别适用于奇数和偶数。我尝试了代码并且它有效,但我很好奇理解逻辑。

select avg(sales) as median
from 
(select g1.sales
from ga g1, ga g2
group by g1.sales
having sum(case when g1.sales = g2.sales then 1 ELSE 0 END) >= ABS(SUM(SIGN(g1.sales-g2.sales))))g3;

1 个答案:

答案 0 :(得分:1)

按销售数量对笛卡尔积进行分组,以找到“中间”1或2个销售额,然后将结果平均值给出中位数。查看详细的评论内容。

--the subquery will return the 1 or 2 middle values and the average of those is the median
select avg(sales * 1.0) as median 
  from (
    select g1.sales
           --this creates a cartesian product of ga with itself
      from ga g1, ga g2 
           --and then group by sales, which allows comparision of each given sales figure all others
     group by g1.sales 
    having 
      --the sum(case) here acts a count of row in the cartesian product that have matching sales values
      --this will be the the square of the count() from ga where for each given sales number
      sum(
        case 
          when g1.sales = g2.sales 
          then 1 
          ELSE 0 
        END) 

       >= --The comparison acts as a de-weighting mechanism to handle duplicate sales numbers
          --Such that if I have the same sales figure twice I'll have 4 rows in the Cartesian product
          --and I should see a matching 4 or 0 if the row can be used in the final median calculation

       --the abs(sum(sign())) here acts as a measure of how far of the median each sales is
       --by looking at how many sales are greater then, equal, a lesser. The sales at or nearest
       --the median will have the lowest numbers here.
       ABS(
         SUM(
           SIGN(g1.sales-g2.sales)
         )
     )
   )g3;