我见过other posts about using the median() window function in Redshift,但您如何将其用于最后有一个分组的查询?
例如,假设表课程:
Course | Subject | Num_Students
-------------------------------
1 | Math | 4
2 | Math | 6
3 | Math | 10
4 | Science | 2
5 | Science | 10
6 | Science | 12
我想获得每门课程的学生中位数。我如何编写一个给出以下结果的查询:
Subject | Median
-----------------------
Math | 6
Science | 10
我试过了:
SELECT
subject, median(num_students) over ()
FROM
course
GROUP BY 1
;
但是它列出了主题的每一次出现以及相同主题的相同中位数(例如,这是假数据,因此它返回的实际值不是6,但只显示所有主题的相同):
Subject | Median
-----------------------
Math | 6
Math | 6
Math | 6
Science | 6
Science | 6
Science | 6
答案 0 :(得分:6)
以下内容将为您提供您正在寻找的结果:
SELECT distinct
subject, median(num_students) over(partition by Subject)
FROM
course
order by Subject;
答案 1 :(得分:2)
您只需删除它的“over()”部分即可。
SELECT subject, median(num_students) FROM course GROUP BY 1;
答案 2 :(得分:1)
您尚未在窗口中定义分区。而不是OVER()
您需要OVER(PARTITION BY subject)
。
答案 3 :(得分:0)
让我们假设您想按主题计算其他聚合,例如avg(), 你需要使用子查询:
WITH subject_numstudents_medianstudents AS (
SELECT
subject
, num_students
, median(num_students) over (partition BY subject) AS median_students
FROM
course
)
SELECT
subject
, median_students
, avg(num_students) as avg_students
FROM subject_numstudents_medianstudents
GROUP BY 1, 2