使用AWS Redshift中的Group By计算中位数

时间:2015-02-13 19:27:06

标签: postgresql group-by amazon-redshift window-functions

我见过other posts about using the median() window function in Redshift,但您如何将其用于最后有一个分组的查询?

例如,假设表课程:

Course | Subject | Num_Students
-------------------------------
   1   |  Math   |      4
   2   |  Math   |      6
   3   |  Math   |      10
   4   | Science |      2
   5   | Science |      10
   6   | Science |      12

我想获得每门课程的学生中位数。我如何编写一个给出以下结果的查询:

  Subject  | Median
-----------------------
 Math      |     6
 Science   |     10

我试过了:

SELECT
subject, median(num_students) over ()
FROM
course
GROUP BY 1
;

但是它列出了主题的每一次出现以及相同主题的相同中位数(例如,这是假数据,因此它返回的实际值不是6,但只显示所有主题的相同):

  Subject  | Median
-----------------------
 Math      |     6
 Math      |     6
 Math      |     6
 Science   |     6
 Science   |     6
 Science   |     6

4 个答案:

答案 0 :(得分:6)

以下内容将为您提供您正在寻找的结果:

SELECT distinct
subject, median(num_students) over(partition by Subject) 
FROM
course
order by Subject;

答案 1 :(得分:2)

您只需删除它的“over()”部分即可。

SELECT subject, median(num_students) FROM course GROUP BY 1;

答案 2 :(得分:1)

您尚未在窗口中定义分区。而不是OVER()您需要OVER(PARTITION BY subject)

答案 3 :(得分:0)

让我们假设您想按主题计算其他聚合,例如avg(), 你需要使用子查询:

WITH subject_numstudents_medianstudents AS (
    SELECT
        subject
        , num_students
        , median(num_students) over (partition BY subject) AS median_students
    FROM
        course
)
SELECT
    subject
    , median_students
    , avg(num_students) as avg_students
FROM subject_numstudents_medianstudents
GROUP BY 1, 2